Operating System - HP-UX
1838778 Members
3533 Online
110130 Solutions
New Discussion

Copy large numbers of files

 
David Poe_2
Advisor

Copy large numbers of files

I have a SAN mounted file system that contains over 9 million files with an average size of 20k. There are some 4000 top level directories, with many other subdirectories and files below (probably about 10 levels deep). An "ls -R | wc" took over 19 hours to complete to get the actual number of files. I am running an HP UX 11.0 4 way 9000/L3000 with VXFS. Aside from the fact that this is a poor way to store files and I really am embarrassed to say that there are five more filesystems like this one (on other servers), but this is by far the smallest. I want to copy the files from this filesystem to another because the VG settings are preventing me from growing beyond 16 PV's of about 12 GB. I have created a new VG to move these files to which will allow expansion. I ran a small test and found that it will take approximately 70 hours to copy all of the files with a "cp -pr".

That sounds bad... but here are my real problems. I run in a PROD environment where I get maint windows of 12 hours every other week, so I can't fit this copy into a maint window. So, I thought, there has to be a better way. The only thing I have come up with is to copy over about 500 high-level directories (out of the 4000) at a time, and then symlink the highlevel directory over to the new VG file mount. That way any new files written to it will be put on the new filesystem, and the application can get to the files I have copied, effectively moving those files to the new VG. When complete, I will remove the original VG and rename the new VG to the original VG name and the application will be none-the-wiser. This will take me 8-9 maint windows to accomplish, and the other 5 could take much longer. I'm hoping that there is some better way of doing this. Maybe an HSM type of solution so the file copies/move can happen during the day without the app losing access to its files, but the final goal of being totally on a new VG with an expanded number of PV's and PE's is a requirement. Any suggestions or insight would be greatly appreciated!
8 REPLIES 8
Steven E. Protter
Exalted Contributor

Re: Copy large numbers of files

Shalom,

Some SAN's have a business copy or replication feature that might be able to replicate the files more quikly.

You might also be able to use OnlineJFS to get a snapshot of the original filesystem and do the copy/move while the system is running.

My last and hopeully best suggestion would be mirror/UX.

Use the levextend -m 1
utility command to make a mirror copy of the fileystems to be copied. This creates a second, exact logical volume.

Then use lvsplit to break the mirror. Now you have a copy you can present to production and a copy you copy to the new location.

Then use the find +ctime command to find and copy only those files that have changed during your copy window.

Hopefully I've provided some concepts that you can turn into a real plan.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
DCE
Honored Contributor

Re: Copy large numbers of files


David,

I ran across a similar scenerio - only it involved backing up all those files on a regular basis. needless to say the backup window was not being met.

The solution we hit on (and a variation of it may work for you) was to gzip the files to a single file name and back up just the gzipped info.

You could try compressing/zipping the data. Hopefully there will be enough time saving to meet your window
David Poe_2
Advisor

Re: Copy large numbers of files

The problem is that all the files must stay intact for the application to use. If an ls -R took 19 hours to complete, I'm not sure a find +ctime or any other way of having to traverse the entire filesystem will take less than the 19 hours of an ls. So a very significant portion of time is spent walking through the files. If that is incorrect, please let me know.

My current line of thinking is that I must shorten the number of files to copy somehow, so when I do the final copy in the maint window, it will fit within that time period. Which led me to the copy some number of files during the window, then symlink the high level directories. I then have fewer files to move during the next window. I can keep doing that each window until I am done. Having said that, this process would be very manual and labor intensive. I may be missing something from your suggestions (I'm still pretty new at this), and if I am, please fill me in.
Prashant Zanwar_4
Respected Contributor

Re: Copy large numbers of files

I would probably run a loop through rsync..

You might want to create a list of directories which are there, put those under a file, and now read one at a time..though for or while loop.. and start doing rsync..
First rsync can take a while..but again it is faster then any other copy..
Next rsync which is a differential copy goes real quick.. and one more of it should make things final..

I tried upto 40GB of data.. huge number of files.. never counted thou..

Let me know if this helps..

You can try otherwise lvsnapshot and cut it over deciding on time...

Thanks
Prashant
"Intellect distinguishes between the possible and the impossible; reason distinguishes between the sensible and the senseless. Even the possible can be senseless."
Ninad_1
Honored Contributor

Re: Copy large numbers of files

David,

I dont know how much time the following procedure will take but you can give it a try and do some testing. We are using this method to copy over network a large no of small size files instead of backing up to tape and restoring.
cd sourcedir
tar cf - * | compress -c | remsh other_server "cd destination_dir;uncompress -c | tar cf -"

See if this helps a bit for copying if the other mirroring and rsync options are not available to you

Best luck,
Ninad
Patrice Le Guyader
Respected Contributor

Re: Copy large numbers of files

Hi David,

Is your backup solution something like networker from Legato?
If yes there is a solution for using a part of it (uasm) for copying true network with a high rate (About 100Gb/Hour on Gb lan). I can't remember if you can do incremental copy with it(to verify).

Hope this helps
Pat
Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.
Peter Nikitka
Honored Contributor

Re: Copy large numbers of files

Hi,

if setting up and splitting a mirror is not possible, I would give vxdump/vxrestore a chance, followed by a rsync for final update.

Note, that if you measure a vxdump-session (filesystem umounted!), you can assume that a vxrestore will need about the same time.
If the physical layer of the dump and restore path is different, a

vxdump | ... | vxrestore

will do read/writes in parallel most of the time.

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Bill Hassell
Honored Contributor

Re: Copy large numbers of files

Since your copy is disk-to-disk on the same system, cpio -p should proceed about as fast as you can go (much faster than cp). You'll need to test the throughput by doing some sample copies. Then figure out the number of directories you can copy per hour. Something like this:

cd /oldDirectory/Directory1
find . | cpio -pdlm /newDirectory

To make sure everything was copied, use:

find /oldDirectory/Directory1 -type d | wc -l
find /newDirectory/Directory1 -type d | wc -l
find /oldDirectory/Directory1 -type f | wc -l
find /newDirectory/Directory1 -type f | wc -l

The above finds will count the number of directories and files.

You can then script the cd/find/cpio tasks so they can proceed in parallel, perhaps 5 to 10 seperate copies depending on the SAN speed.


Bill Hassell, sysadmin