Operating System - HP-UX
1834218 Members
3591 Online
110066 Solutions
New Discussion

Performance of vxfs directory with many files

 
Oliver White
Occasional Advisor

Performance of vxfs directory with many files

We have an application which has generated a large number (100,000) of files in a directory, and continues to do so. I understand that there is effectively no limit to the number of files vxfs can store in a directory, but performance of directory operations is suffering now, so I am going to start splitting files up into seperate directories.

The question is, what is the optimum number of files to store in a directory? Is there some magic number past which there is a sudden performance impact?
4 REPLIES 4
Stefan Farrelly
Honored Contributor

Re: Performance of vxfs directory with many files


Take a look at the following link, it seems the optimum is around 100-150,000 files per filesystem.

http://www.dutchworks.nl/htbin/hpsysadmin?h=3&dn=50618&q=large%20number%20of%20files&fh

Im from Palmerston North, New Zealand, but somehow ended up in London...
Wodisch_1
Honored Contributor

Re: Performance of vxfs directory with many files

Hi Oliver,

since UN*X (all flavours of) reads and writes directories ALWAYS at once, you get a hell of a lot of I/O only for that directory itself!
Use "sar -a" (IIRC) and look after the columns "iget" (Inode got per second), "namei" (names converted to their resp. inodenumbers), and "dirbk" (directory blocks processed per second). If you have too many "dirbk/s" then your system slow down dramatically (due to making dozens of megabytes of I/O only for those directories).

For that reason it has always been good practise to keep directories small - and it is just that reason to use those one character subdirectories "a".."z" ..., simply to NOT have thousands of files in one dir!

Just my $0.02,
Wodisch
Dietmar Konermann
Honored Contributor

Re: Performance of vxfs directory with many files

Hi!

At least for VxFS I doubt that directories are read/written at once.

When e.g. searching for a file with stat(2) the file name is matched against the entries of the DNLC. If there is no hit in that cache we start to scan directory blocks, beginning with that block we scanned the last time.

So we read in directory blocks rather than reading complete directories. Each directory block starts with a hash table to make the search in the following entries more efficient. If the entry is found we stop the scan, of course.

Certainly there are operations which may cause complete scans. So I agree that the number of files/dir should be as small as possible, say <10000. Keep filenames short... then more entries fit into one block! Regularly defragment the directories (better with VxFS 3.3, layout 4). Use a recent VxFS patch level.

Regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Bill Hassell
Honored Contributor

Re: Performance of vxfs directory with many files

You may need to define performance. Is it performance seen by someone doing an ls or find command? Or is it the actual create, open, read/write, close, delete time for the application?

The former is a matter of social engineering: the filesystem is massive so ls will take a very long time and ls * will always give an error: LINE TOO LONG which it is. The fix is to educate users on how to navigate a massively large and flat filesystem. Never use wildcards that will result in thousands of hits. Or, if it is necessary to count files, perform a find on the directory one time and put the results in a file, then search there.

If the impact is on the application (slow response times for create would be likely), this is a normal characteristic of any massively large directory--and an example of a very poor design with any filesystem type. Assuming the files are randomly named, you can get a 2500x improvement by creating 26 directories and moving the files to the appropriate directory (ie, afiles, bfiles, cfiles, dfiles, etc). Or group by project name or by territory or by job name, etc, anything to break up the flat filesystem.

Directories are extremely efficient in reducing unneccessary I/O in this situation.


Bill Hassell, sysadmin