Operating System - HP-UX
1753510 Members
5967 Online
108795 Solutions
New Discussion юеВ

millions of files per directory

 
SOLVED
Go to solution
Joe Odman
Occasional Advisor

millions of files per directory

Is there any filesystem that supports millions of files per directory? These are very small files. Assuming that the filesystem must continue to support the large numbers, how can I alleviate the obvious performance issues? Does HP, Veritas, Pillar, NetApp, EMC, or anyone have a solution?
14 REPLIES 14
lawrenzo
Trusted Contributor

Re: millions of files per directory

The problem with having millions of files in a directory is when you run commands against the directory ie ls or find etc the command will run for an exessive period of time and if you use a wild card the the search ie ls -l * this may fail due to memory.

Other things like backups or filesystem sychronisation may be prolonged due to the number of files that have to be opened and written etc.

As far as I am aware the inode limit is the setting that determines how many files can be in a filesystem as each file is added to the inode table.

hello
lawrenzo
Trusted Contributor

Re: millions of files per directory

what application will be writing these files or accessing them? is there a tuning doc for the application?
hello
Joe Odman
Occasional Advisor

Re: millions of files per directory

Assume that the application cannot change. The filesystem itself must change.
Wouter Jagers
Honored Contributor

Re: millions of files per directory

I have seen the side-effects Lawrenzo is talking about first-hand, and I can tell you it is far from easy to work on a problem within such directories.

Best thing to do (if possible) is to create some sort of hashing-algorithm to put these millions of files in a tree of subdirectories.

For example, given a bunch of files ranging from 'a000000' to 'c999999' you could start by having subdirectories 'a', 'b', and 'c' (each good for one millon files). Within these directories you could then have subdirectories '000' to '999', each holding one thousand files.

A simplified example, of course, but I'd try implementing something like this in order to avoid the complications described above.

Cheers,
Wout
an engineer's aim in a discussion is not to persuade, but to clarify.
lawrenzo
Trusted Contributor

Re: millions of files per directory

you will have to set the inode limit or at least check the size and determine if this will be reached.


setup a script as mentioned to move the files into sub directories or the environment will become unmanageable.

HTH
#

Chris
hello
James R. Ferguson
Acclaimed Contributor
Solution

Re: millions of files per directory

Hi Dave:

Divide and conquer. That said, by using current VxFS (JFS) releases (e.g. 4.1 or later) with the latest version layout and mount options that meet your needs, but offer the best performance, you can probably achieve some gains.

Have a look at thie white paper on JFS performance and tuning:

http://docs.hp.com/en/5576/JFS_Tuning.pdf

Another good source of mount options as they relate to performance for VxFS filesystems is the manpages for 'mount_vxfs'. You might find, for instance that mounting with 'noatime' helps speed up your filesystem searches if this is their predominate activity.

http://docs.hp.com/en/B2355-60105/mount_vxfs.1M.html

Regards!

...JRF...
Joe Odman
Occasional Advisor

Re: millions of files per directory

This is the most appropriate response so far. We are on JFS layout 4 now, but will look at migrating ot layout 5. We may also test the noatime. The Netapps WAFL filesystem claims to be better yet, but this is not quantified. I also found that the ReiserFS Version 4 (SUSE) handles a million files in a directory efficiently. Still looking for a better HP-UX solution.
A. Clay Stephenson
Acclaimed Contributor

Re: millions of files per directory

I would get a baseball bat and use it on whatever developer or vendor came up with this scheme but if you are looking for solutions that insist that the application be maintained as is, I would say that the only viable alternative is a solid-state disk. Directory searches are linear and it will still require on average n/2 accesses to find a given file; at least with a solid-state disk (backed up automatically and transparently with conventional disks), these searches will be as fast as possible.

You are essentially using the directory as a database - something that it was never intended to do.
If it ain't broke, I can fix that.
Thomas J. Harrold
Trusted Contributor

Re: millions of files per directory

I had investigated "content" storage appliances, such as EMC's Celerra several years ago. They used a diffent approach, called "node based" storage, where the solution is actually a group of small servers, each with their own local storage, a set of master nodes, used to arrange the storage. EMC claimed that this system could handle millions of files.

Could I ask why? Could you work out a front-end access script/program that would sort and store programs in separate directies instead?

-tjh
I learn something new everyday. (usually because I break something new everyday)