Operating System - HP-UX
1847245 Members
2934 Online
110263 Solutions
New Discussion

Hundreds of thousands of small (5-10k) files - improve backup time

 
SOLVED
Go to solution
Maxim_12
New Member

Hundreds of thousands of small (5-10k) files - improve backup time

We have and EDI system that has hundreds of thousands of small files and it takes for ever to back it up nightly. We do incrementals daily (6-7 hours) and one weekly full (12-16 hours).

We are using HP OpenView Storage Data Protector version A.05.10 on a HP-UX server.

Can someone give me any suggestions on how they backup there similar systems to help us improve our backup times?
5 REPLIES 5
Simon Hargrave
Honored Contributor

Re: Hundreds of thousands of small (5-10k) files - improve backup time

Are these EDI files that are generated, tranferred then just kept for reference?

If so then how about on a daily basis, tar and compress all files for that day into one archive. That way you've only got one file per day to backup.
Prashant Zanwar_4
Respected Contributor

Re: Hundreds of thousands of small (5-10k) files - improve backup time

YOu can do a trick.

Files which are not used, not modified for n-days, just keep them archived, usinf tar in a direcotry called TAR, or something of your choice. Back them up seperately. And in original directory just backup the new files..

find . -mtime +n -type ...........

I hope this can be best to implement..

Hope this helps
Thanks
Prashant
"Intellect distinguishes between the possible and the impossible; reason distinguishes between the sensible and the senseless. Even the possible can be senseless."
A. Clay Stephenson
Acclaimed Contributor

Re: Hundreds of thousands of small (5-10k) files - improve backup time

Your situation is quite common and I really don't like separate backups for files that don't change much; it's just too easy to miss something that way and restores require several passes. I really prefer backups that grab everything except what I explicitly don't want (e.g. /tmp/*). Moreover, no matter what you do you really can't control how many new files are created.

The best approach is to literally not care how long the backups take. One approach is to use vxfs snapof mounts to create snapshots of each filesystem. This takes only seconds per filesystem. Normal operations can now resume and you then backup the snapshots at your convenience. Speed is now no longer of much concern. When the backup is finished, you unmount the snapshots.

You can improve your backups somewhat by changing from "Log All" to "Log Directory"; this reduces the number of DP database updates at the expense of more tedious restores.
If it ain't broke, I can fix that.
Zygmunt Krawczyk
Honored Contributor

Re: Hundreds of thousands of small (5-10k) files - improve backup time

Hi Maxim,
if you can unmount the filesystem, where the files resides, for a some time, you can do a backup of raw volume for improving backup performance.

Regards,
Zygmunt
Stuart Whitby
Trusted Contributor
Solution

Re: Hundreds of thousands of small (5-10k) files - improve backup time

I see this situation in NetWorker a lot. I'm not so sure about OmniBack, but the way we recommend to handle stuff like this is to do a raw filesystem backup, though we don't need to take the filesystem offline to do it as suggested above. This will back up in the amount of time it would take to throw that amount of data at a tape since there is no interaction with the filesystem at all. The problem with this is that you generally need to recover the whole filesystem to get a file back, so you need extra space available somewhere to do this.

The only real way that you can improve the performance for this filesystem is to improve the filesystem performance. Change to a faster FS and get off RAID 5 - if you're using it at all. Otherwise, just look at how long it takes to do an ls -R on this filesystem. If you have directories with loads of entries, it'll take the backup software the same amount of time to get those entries and check whether it needs to back them up (in the case of an incremental). Also, this will result in your drive shoeshining rather than streaming, so you've got a bunch of things ganging up to make sure you're not going to get performance from this backup.

Snapshot is the best idea to allow you to get back individual files. Depending what kind of transactions are taking place here, you may want to take a snapshot every hour to enable you to get files back immediately from disk, then delete those after your nightly backup. This will probably save you from going to tape except in the event of either a real disaster or the "I don't know when I last saw the file" user...
A sysadmin should never cross his fingers in the hope commands will work. Makes for a lot of mistakes while typing.