topic Re: Performance of a filesystem with 10 million files in Operating System - HP-UX

Performance of a filesystem with 10 million files

Doug Kratky — Fri, 17 May 2002 22:22:38 GMT

Are there things I can do to help filesystem performance on a system with 10 million files?

Facts:
- Application keeps data in a somewhat-hierarchical directory structure of more than 10 million small files (average of a few thousand files per directory)
- Using N-class, LVM, JFS
- Disk space is LVM-striped across many EMC drives

It takes several to many hours to run find commands. It takes 2 days or so to do a full NetBackup backup. (I've thought about backing it up "raw", but would want the application to be down and filesystem to be unmounted.) Extrapolating from smaller tests, I believe it would take several days or longer to do a restore.

Is there anything I can do to dramatically help performance? I believe I am running into fundamental limitations with the JFS filesystem and directory processing. Would more filesystems or a different layout work better? Are there alternatives to JFS that would better solve the problem?

Any insight or personal experiences would be appreciated!

Re: Performance of a filesystem with 10 million files

James R. Ferguson — Fri, 17 May 2002 23:12:35 GMT

Hi Doug:

I don't doubt that walking your directories takes massive amounts of time!

I'd look to "divide-and-conquer" stratagies. Whereever possible, divide the directories you have. Choose multiple mountpoints. Exclude directories and files from backups that are logically readonly. *Move* files from "active" to logically "readonly" directories so that the backup exclusion rules apply. If you can write (script) rules to do this, do so.

Obviously, careful organization of hierarchical directories along with properly restricted 'find' commands can reduce resource strain. Part of this is user education. Make sure your users understand that 'find'ing from the root ("/") directory not only costs them time but other users too.

JFS (VxFS) filesystems provide superior performance to HFS ones and striping LVM extents is generally helpful to performance.

I presume that you have the OnlineJFS license. If no, get it. If/when you do, evaluate the kinds of activity you do within a file directory. Ask questions like is this a directory holding temporary files only for the life of a process or is it a directory of files that are processed in very large I/O chunks. The idea here is that having divided (aligned) your directories into multiple mountpoints, you can *optimize* the filesystem's processing by *optimizing* the mount point for that particular function.

Have a look at the man pages for 'mount_vxfs' and the Technical Knowledge Base document #KBRC00007737 for more information.

If you are *not* using a database engine like Oracle, consider giving generous amount of memory to the kernel buffer cache. Experiment and measure (e.g. with 'Glance'). You may find that "local" UNIX buffering as well as your EMC cache can supplement access times.

Regards!

...JRF...

Re: Performance of a filesystem with 10 million files

Doug Kratky — Sat, 18 May 2002 00:19:12 GMT

Thanks for the quick answer and helpful comments.

I'm not sure I will be able to influence directory layout very much if at all, though it is one path I'll investigate.

Let me add my biggest worry no matter what the layout ... How am I going to restore 10 million files in a reasonable time?

Doug

Re: Performance of a filesystem with 10 million files

Steven Sim Kok Leong — Sat, 18 May 2002 00:51:50 GMT

Hi,

If they are all very small files, you may want to consider reducing the filesystem's block size (if it is not already the default 1 kb) so that you have a more efficient filesystem in terms of space usage. Because vxfs does allocation and I/O in multiple-block extents, keeping the logical block size as small as possible increases performance and reduces wasted space for most workloads.

Also, again with vxfs, you would want to use fsadm regularly to perform reorganisation of your filesystem directories (-d option) and file extents (-e option).

man 1m fsadm_vxfs for more details.

Hope this helps. Regards.

Steven Sim Kok Leong

Re: Performance of a filesystem with 10 million files

Steven Sim Kok Leong — Sat, 18 May 2002 00:57:18 GMT

Hi,

With regards to backup, you should consider the incremental backup strategy.

In that case, you only consider backing up of only files that have been modified since the last backup.

This saves you resources in terms of backup time and tape space.

Hope this helps. Regards.

Steven Sim Kok Leong

Re: Performance of a filesystem with 10 million files

Bill Hassell — Sat, 18 May 2002 19:43:18 GMT

Since this is a massive filesystem (and probably very important), I would strongly encourage looking at alternate backup strategies (ie, HP's Omniback). Also, there are a number of kernel params and buffer cache values that can help, assuming you have no artificial limits on RAM (ie, if management won't buy more RAM).

There are no easy answers (other than to redesign the 'database' as it will be inefficiant on any computer). I would suspect that system overhead is fairly high due to all the open/close system calls. Use sar -a to verify. A du or a find on the top level will kill the system. This set of files must be handled very ifferently than a classic filesystem.

Re: Performance of a filesystem with 10 million files

Sridhar Bhaskarla — Sat, 18 May 2002 20:15:46 GMT

Doug,

You cannot help much with this setup. However, for disaster purposes I would depend on EMC's SRDF than netbackup. With SRDF in place, you can reduce the number of full backups to few times in a month and do incremental backups and use it only to restore the files.

Also adding little more buffer cache may not be a bad idea in cases like yours.

Since you have lvmstriping in place, you can effectively load balance the paths. So, try adding few more interfaces and load balance further to get better latency.

-Sri

Re: Performance of a filesystem with 10 million files

harry d brown jr — Sat, 18 May 2002 21:03:27 GMT

Doug,

We had the same issue with storing 10's of millions of report files. We resorted to building a flat-file type filesystem and use a database to index the physical address. And to make matters worse, it's on cd's (a very large HP cd library)!!!

I plan on moving this onto EMC's Centera product. Very Very cheap - 250K for 10Terabytes. The centera is great for "fixed" content, meaning content that doesn't change - like reports, images, mpeg's, etc...

JRF hit it on the head - If you can't move it to another device like the Centera (and there are no others like it), then use MULTIPLE mount points, many disks, many IO channels, Striping, MAGIC, LUCK, Blood, Sweat, and tears.

live free or die
harry

Re: Performance of a filesystem with 10 million files

A. Clay Stephenson — Sun, 19 May 2002 00:32:31 GMT

Hi:

I once had to support a system with a very large number of files as well. This system was basically a CAD system that had an Oracle database that kept up with the metadata but the actual drawing were in regular files. The database essentially pointed the application to the filenames. When I started, all the files were in one filesystem (sound familiar) but were distributed across many directories within the filesystem. The solution was as has been mentioned to divide and conquer. Your application will simply see other directories but those other directories should be other filesystems below your outer level directory.

I would look at dividing this thing into 10 or more filesystems. I would also suggest that you use a backup solution like OmniBack II; in this case you want the media agents directly attached to the host with the disk agents so that your network is not swamped.

You also definitely need some some of automated backup with a tape library to handle the load.

Since this also sounds like a system that has metadata pointing to the actual data in regular files, I will mention one other thing that you outgt to consider - snapshots. If you snapshot the filesystems (which should only require a few minutes of downtime because the entire filesystem is not copied) you can get a consistant set of data and you really don't care how long your backups take because the system can remain operational.

Food for thought, Clay

Re: Performance of a filesystem with 10 million files

Phil Daws_2 — Sun, 19 May 2002 23:44:27 GMT

If you can influence the directory structure then consider interleaving. I believe down to three levels is the most efficient. Take the example of the filename testfile.zip. The directory structure would be :

| -> t
| -> -> e
| -> -> -> s
| -> -> -> testfile.zip

If you require fast disk I/O with snapshots then take a look at using NAS ie. NetworkAppliance. From a Oracle perspective they perform very well and have a strategic alliance with Oracle. You can connect it to your HPUX server via Gigabit peer->peer network connection.

Just my 2cents worth.

Re: Performance of a filesystem with 10 million files

Carlos Fernandez Riera — Mon, 20 May 2002 09:34:45 GMT

See this page:

http://h21007.www2.hp.com/dspp/topic/topic_TopicDetailPage_IDX/0,1711,10313,00.html

There are some docs you will like to read, regarding fs performance and kernel tunning.

___

The backup/restore time is always limited by your hardware, and stablished by your needs ( the bussines needs, your company's needs).. I mean that if you have to restore in 2 hours ( i.e.) your company should spend money to make it posible, by buying more hardware/software like disks (SDRF) or more tapes ( robots) or a faster access to tapes via media-server or FC card + FC/scsi bridge..

Re: Performance of a filesystem with 10 million files

Doug Kratky — Mon, 20 May 2002 12:11:28 GMT

Thanks for the answers so far. We have FC-attached tape drives, and we have used both OmniBack and NetBackup. Both of them take equally long (a couple of days for a full backup). Incremental backups take a long time, too (close to a day). And I'm not very confident I could successfully do a restore/

I really don't blame the backup programs - I blame the filesystem and the layout of the files. I've convinced myself that it will always take a long time, but would like to find out ohterwise.

For those who have systems with millions of files:
- How long does it take you to do a backup?
- Have you ever had to restore millions of files and how long did it take?

Thanks,
Doug

Re: Performance of a filesystem with 10 million files

Trevor Dyson — Mon, 20 May 2002 12:42:44 GMT

At the risk of asking a dumb question...

Why is the disk space LVM striped across the EMC disks. If the EMC disks are already set up as raid 5 LUNs then this would be striping on striping which would degrade performance (although not enough to causes the sort of symptoms you have withthe back up).

Re: Performance of a filesystem with 10 million files

Doug Kratky — Mon, 20 May 2002 12:59:24 GMT

The EMC disks are mirrored (not RAID-5), then we stripe across the disks with LVM.

Re: Performance of a filesystem with 10 million files

Trevor Dyson — Mon, 20 May 2002 13:00:52 GMT

Oh well.... it was a dumb question after all.

Re: Performance of a filesystem with 10 million files

harry d brown jr — Mon, 20 May 2002 13:16:30 GMT

Doug,

Are these files "static", meaning that they don't change, or where change is rare, and where you usually add/delete files? Also, how big are these files?

live free or die
harry

Re: Performance of a filesystem with 10 million files

Doug Kratky — Mon, 20 May 2002 13:34:43 GMT

Files are a few K each. They never change after they're created. Several thousand new files are created each day all throughout the directory structure.

Re: Performance of a filesystem with 10 million files

Carlos Fernandez Riera — Mon, 20 May 2002 13:40:20 GMT

Read the the answer from bill Hassell ( and The question) in this thread:

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0xf77d42308663d611abdb0090277a778c,00.html

Re: Performance of a filesystem with 10 million files

A. Clay Stephenson — Mon, 20 May 2002 15:38:12 GMT

Hi again:

I did a bit more digging on some of my systems and I now have a little data for you. On three of my application servers, I have filesystems with about 700,000 mostly small (a few KB each) files on them and under OmniBack those require about 1.3 hrs. on average to backup and about 1.5 hrs to restore. These are all running simultaneously. I realize that this is a much smaller number than 10^7 but if you can divide you one filesystem into about 10 filesystems then the data fits nicely and suggests that provided that you have enough tape drives and that concurrency is set to the maximum (5) that you should be able to backup/restore in the 2 hours+ timeframe.

The real key to any performance in your environment is to take what are now directories and make them filesystems.

-----------------------------------------

Let me also give you a 'Plan B' to consider. Given that the vast majority of your data is static, you should seriously look into a product called 'Openview OmniStorage'. It is built on top of a vxfs filesystem and converts it into a hierarchical filesystem. Recently accessed data is kept on traditional magnetic disk and seldom used data is migrated to secondary (optical platters) storage or even optional tertiery (tape libraries) storage.
If a file is requested that is no longer in magnetic cache, it is migrated in to magnetic cache on an as needed basis. Essentially, this looks like one enormous filesystem to the application; some files are available very quickly; others may take a little time to access. You could have a very large magnetic cache and this your access would always be fast. You can force (via a cronjob) the magnetic data to be migrated out to secondary storage to provide a level of backup since MO is considered very stable. Note that migrating out does not remove the data from primary storage but only makes it a candidate for removal from primary storage. There is also OmniBack integration to handle VBFS (Very Big File System - i.e. OmniStorage). Because this is build on vxfs, I THINK vxfs snapshots will also work.

While this won't solve your backup entirely, it might change the question so that 'nitely' backups become a migout to secondary storage and weekly backups become a vxfs snapshot followed by an OmniBack Backup that might take
two days (but since you are snapshoted and up and running, who cares?).

More food for thought, Clay

Re: Performance of a filesystem with 10 million files

Michael Lampi — Mon, 20 May 2002 16:20:25 GMT

The above answers are all quite good. I believe that you are actually running into issues inherent with JFS. Separating the files into separate filesystems is probably the only way to make a JFS-based arrangement work.

On the other hand, moving to a network attached file server could bring with it some excellent benefits, especially if that NAS uses your EMC for its data store.

Tests I have run with Traakan's (www.traakan.com) NAS have indicated extremely little performance degradation with thousands of files in any given directory. Only when the number of files _in_a_single_directory_ approaches a 500 thousand (half a million) does the access time start to increase, and then only for that directory.

I have had as many as 6 million files in a single directory. While initial file access time (opens and lookups) in that directory was rather high, access times for all the other directories remained quite fast.