Operating System - HP-UX
1753575 Members
5859 Online
108796 Solutions
New Discussion юеВ

Re: Best practices for having bunch of small files on vxfs

 
Viktor Balogh
Honored Contributor

Best practices for having bunch of small files on vxfs

Hi,

 

Could you point me some best practices for having some million small files (around ~50k each) on vxfs? The fragmentation is extensive so I'm thinking about having a cronjob with fsadm, but that's all I've found so far. I think I need to create a separate filesystem for those data. I'm speaking about Oracle audit logs, so it's mostly write-once read-never. ;) Not sure why they chose FS-side logging, don't ask me about that.

 

Regards,

Viktor

 

****
Unix operates with beer.
3 REPLIES 3

Re: Best practices for having bunch of small files on vxfs

We have a directory that contains 1.000.000 files somewhere, but experience a lot of performance issues when accessing it. A simple ls runs for 15 minutes. 

 

The maximum number of inodes is 1 billion, so it's possible, but I read somewhere that it's best not to go above 100.000 files in one directory. 

 

I'd advise to create a script to cleanup the directory and move older files to separate file systems, or directories.  

Dennis Handly
Acclaimed Contributor

Re: Best practices for having bunch of small files on vxfs

You can also tar up many small files into one bigger archive.  And gzip if you want to save space.

Bill Hassell
Honored Contributor

Re: Best practices for having bunch of small files on vxfs

Once you have a flat directory with 100's of thousands of files (or millions), you have a major performance issue in many, many areas. A simple ls is out of the question...you have to deal with the files in a more sophisticated way. The filenames are very likely a predictable pattern (sequential numbers, sequential timestamps, whatever). If you need to move (or remove) a roup of files, start with the most restrrictive pattern you have, perhaps an exact match for all but the last 2 characters: ls abc123456?? If you have knowledge of the quantity versus filenames, use that to pick groups to move/remove.

 

Defrag will be very disk intensive for a long time, probably cancelling any benefit after the defrag.

 

But the real fix is: DON'T allow this type of design. It will always be a major problem. Create a directory structure, probably based on dates. Then script a procedure to a few hundred files to the directories. Once the directory structure has been populated, the active set of files should be a few hundred, perhaps a couple of thousand. Then create a cron script to move and cleanup the files every day.



Bill Hassell, sysadmin