1829229 Members
2301 Online
109987 Solutions
New Discussion

Re: Fastest Copy Method

 
SOLVED
Go to solution
David Poe_2
Advisor

Fastest Copy Method

I'm moving several terabytes of data to a new VG because of previous VG constraints and to consolidate to fewer/larger LUNs. I have searched around the forums to find a fast method of copying the data and found that cp is very slow, while vxdump and dd are considered much faster. In my testing, I have found the complete opposite to be true. I'm hoping someone can shed light on my issue, which could be a poorly configured server. I believe the server is at HP UX 11i default kernel parms (Itanium). My test bed are two mount points each attached to a 75 GB LUN on a Clariion CX600. My timings and amount of data to be moved are below. cp turned out to be the fastest copy method(by far) which surprised me. dd was taking well over 4 hours and still hadn't finished, so I canceled the dd test. I do understand that the dd would also copy the empty data blocks in the LV as well as my regular data, but 4 hours for 75 GB still seems like an awful long time. vxdump had taken 45 minutes and was only about 20% complete with the copy. Any help or insight would be greatly appreciated!

dd & vxdump commands I used.
timex dd bs=4096 if=/dev/santest1/rlvtest of=/dev/santest2/rlvtest &

timex vxdump -0f - -s 1000000 -b 16 /santest1 | (cd /santest2; vxrestore -rf - /santest2) &

TEST DETAILS
/dev/santest1/lvtest
76800000 19567593 53655387 27% /santest1
steg002:/santest1# ll
total 39062648
-rw-r--r-- 1 root sys 1000000000 Apr 1 20:26 bob1
-rw-r--r-- 1 root sys 2000000000 Apr 1 20:27 bob2
-rw-r--r-- 1 root sys 1000000000 Apr 1 20:27 bob3
-rw-r--r-- 1 root sys 4000000000 Apr 1 20:28 bob4
-rw-r--r-- 1 root sys 2000000000 Apr 1 20:29 bob5
-rw-r--r-- 1 root sys 1000000000 Apr 1 21:38 bob6
-rw-r--r-- 1 root sys 2000000000 Apr 1 21:39 bob7
-rw-r--r-- 1 root sys 4000000000 Apr 1 21:39 bob8
-rw-r--r-- 1 root sys 3000000000 Apr 1 21:40 bob9

steg002:/santest1# timex cp -Rf . /santest2

real 5:15.07
user 0.48
sys 1:05.19

steg002:/santest1# timex fbackup -0i . -f - | (cd /santest2; frecover -Xrf -)
fbackup(1004): session begins on Fri Apr 1 21:52:35 2005
fbackup(3024): writing volume 1 to the output file -
fbackup(3055): total file blocks read for backup: 39062518
fbackup(3056): total blocks written to output file -: 39062576

real 6:41.18
user 12.80
sys 1:55.05

steg002:/santest1# timex cp -Rf . /santest2

real 6:54.58
user 0.49
sys 1:03.72

steg002:/santest1# timex fbackup -0i . -f - | (cd /santest2; frecover -Xrf -)
fbackup(1004): session begins on Fri Apr 1 22:11:26 2005
fbackup(3024): writing volume 1 to the output file -
fbackup(3055): total file blocks read for backup: 39062518
fbackup(3056): total blocks written to output file -: 39062576

real 7:33.15
user 12.23
sys 1:53.45

steg002:/santest1# timex tar -cf - . | (cd /santest2; tar -xf -)

real 15:39.64
user 2:59.30
sys 5:17.21

steg002:/santest1# timex cp -pRf . /santest2

real 5:23.46
user 0.48
sys 1:03.57
23 REPLIES 23
Rick Garland
Honored Contributor

Re: Fastest Copy Method

As you have realized, there are different stats for the different tools. The difference can be the result of network bandwidth, network traffic, host config, client config, IO path, IO config, etc...

Lots of variables.

What I place more emphasis on is the error checking of the copy you are doing. The vxdump and dd tools will tell you if there was a problem. The cp and cpio (and tar) do not tell you if there was a problem. Yes you could script a solution to this issue but the default tools are essentially silent.
David Poe_2
Advisor

Re: Fastest Copy Method

Network isn't an issue as this is all done locally (minus the SAN). I'm also trying to minimize the time to copy the data, while preserving the data itself. From what you are saying, do you mean that cp is actually faster than vxdump and dd? The major benefit seems to be the verification that the other methods offer? I was hoping for a faster (and safer) copy method, if one exists. Copying 3+ terabytes of data would take days if that is the case, and I'm sure hoping it is not.
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: Fastest Copy Method

Your dd would benefit from a larger bs --at least bs=64k; I suspect bs=1024 would be optimal.
If it ain't broke, I can fix that.
Bill Hassell
Honored Contributor

Re: Fastest Copy Method

bs=4096 is definitely too small. Use bs=64k or bs=128k. dd will be the fastest by far since it bypasses all filesystem overhead. This is especially true if there are thousands to millions of files. fbackup can be made quite a bit faster (even for disk-to-disk) by using a -c config file such as this:

blocksperrecord 512
records 64
checkpointfreq 1024
readerprocesses 6
maxretries 5
retrylimit 5000000
maxvoluses 200
filesperfsm 2000


Bill Hassell, sysadmin
A. Clay Stephenson
Acclaimed Contributor

Re: Fastest Copy Method

Ooops, I meant I suspect that bs=1024k would be optimal.
If it ain't broke, I can fix that.
David Poe_2
Advisor

Re: Fastest Copy Method

I figured that, Clay. I suddenly realized that I was missing the "k" at the end of my bs number. I'm testing the "dd" now and will let you know. I'm also going to run the test using fbackup with the config file suggested by Bill. We do have some file systems that are very large which are only 50% full that need to be moved as well. I expect to find a happy medium between how full a filesystem should be before dd becomes an option.

Re: Fastest Copy Method

David,

WHile your at it why not try cpio as well - I found this to be quite fast in some tests I did a couple of years ago:

cd /santest1 ; find . -depth -xdev -print | cpio -pudlm /santest2

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Bill Hassell
Honored Contributor

Re: Fastest Copy Method

cpio is indeed very fast with the -p option and I often recommended this as the best way to copy files and directories.

cd /santest1
find . | cpio -pudlmv /santest2

Unfortunately, it can only handle files less than 2Gb in size which is usually a big issue with very large filesystems.


Bill Hassell, sysadmin
Rick Garland
Honored Contributor

Re: Fastest Copy Method

I'm not saying that cp is faster. I am saying I feel better knowing I can see the integrity of the copy process. Be it with vxdump or dd.
Geoff Wild
Honored Contributor

Re: Fastest Copy Method

Your main issue isn't any of the methods - but the Clarion itself - it use parity raid - so large data transfers suck - as it has to constantly keep on figuring out the parity....

vxdump, dd, etc are great for mirrored (or RAID 10) type disks...

I'm afraid with the CX600 - you will need a long outage....

My 2 cents...

Rgds...Goeff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
David Poe_2
Advisor

Re: Fastest Copy Method

Geoff, we are using RAID 10 on our Clariion. The issue is that cp was outpacing every other method by a wide margin which is surprising to me based upon my research for fast copy methods in the forum. Thanks to the suggestions of all, I found that with some tweaking, these other methods were much faster, but still shy of cp. The benefits from what I can see are data integrity checks which add to the processing time. Very soon we will be migrating to a DMX box, but having said that, the I/O on the server is my current constraint and not the SAN or Disk Array.
Zinky
Honored Contributor

Re: Fastest Copy Method

David,

Having just migrated approximately 32 TB of data from 2 EVA5000's to an XP12K over the weekend, I would say dd was on the average 15-20% faster than vxdump/vxrestore and is the fastest. I used a block size of 4096K (4MB). I think in your dd syntax - you were merely using 4KB instead of 4MB - that is why it was slow. On servers where I have 2 FC-HBAs, I run 4 simultaneous dd copies whilst on servers that have 4 FC-HBAS, I run 8 concurrent. I literally approached the limits of my FC-HBAs during my copies.

dd if=/dev/sourcevg/rsrclvol of=/dev/vx/rdsk/targetdg/tgtvol bs=4096K

Oh, btw.. we also converted to VxVM from LVM.. that's whythe above syntax.

Hakuna Matata

Favourite Toy:
AMD Athlon II X6 1090T 6-core, 16GB RAM, 12TB ZFS RAIDZ-2 Storage. Linux Centos 5.6 running KVM Hypervisor. Virtual Machines: Ubuntu, Mint, Solaris 10, Windows 7 Professional, Windows XP Pro, Windows Server 2008R2, DOS 6.22, OpenFiler
David Poe_2
Advisor

Re: Fastest Copy Method

Here are some numbers from my tweaked settings (thanks to your suggestions). dd - 32.5 minutes for a 75 GB LUN. And fbackup 5.75 minutes for 20 GB a files transfered. These numbers are much better, still slower than cp when used on the smaller number of large files. I expect dd to be much faster on our fuller file systems with (unfortunately) several million files scattered throughout a few hundred directories.

Thank you all for your input! I'm going to keep this thread open for a couple more days in case anyone else wants to post other suggestions.

steg002:/# timex dd bs=2048k if=/dev/santest1/rlvtest of=/dev/santest2/rlvtest &
37500+0 records in
37500+0 records out

real 32:45.16
user 0.09
sys 13.60

steg002:/santest1# timex fbackup -0i /santest1 -f - -c /home/dpoe/fbackup_disk.conf | (cd /santest2; frecover -Xrf -)

fbackup(1004): session begins on Mon Apr 4 21:53:39 2005
fbackup(3024): writing volume 1 to the output file -
fbackup(3055): total file blocks read for backup: 39062518
fbackup(3056): total blocks written to output file -: 39062576

real 5:43.47
user 0.91
sys 24.51
David Poe_2
Advisor

Re: Fastest Copy Method

I just thought of another question that I didn't think to ask. Is there someplace that I can look to verify the data integrity check for each of these commands? They don't seem to be any place obvious within the MAN pages. I just would like the warm fuzzy of seeing it formally in writting somewhere when I propose these solutions to my boss. TIA!
Zinky
Honored Contributor

Re: Fastest Copy Method

David,

If you use vxdump/vxrestore - then be assured that your copies are fine as long as there are no errors during the process. VxFS (JFS) still is the best mainstream filesystem out there and does all integrity checks.

If you use dd - then make sure your destination (and the source) raw lvol/vol device do not have their filesystem mounted - if they contain one. If you're using dd - then post dd copies, your check on the destination will be via a mount of the overlying filesystem - which if it is VxFS, will alert you if there are any inconsistencies. Very rare you will have inconsistencies specially if you unmount your source filesystem as well. Never mount the filesystem on th etarget raw lvol/vol while your dd copy is in progress.



Hakuna Matata

Favourite Toy:
AMD Athlon II X6 1090T 6-core, 16GB RAM, 12TB ZFS RAIDZ-2 Storage. Linux Centos 5.6 running KVM Hypervisor. Virtual Machines: Ubuntu, Mint, Solaris 10, Windows 7 Professional, Windows XP Pro, Windows Server 2008R2, DOS 6.22, OpenFiler
Geoff Wild
Honored Contributor

Re: Fastest Copy Method

Interesting - didn't know the CX600 did RAID 10 - not bad at all...

How many paths to the LUNS do you have?

I did a fairly recent migration from old symetrix to DMX1000's - one of the DB's we used vxdump - it was 1.5 TB - and took almost 8 hours...that was on a HP-UX 11.0 parisc server - 4gb ram, 4 cpus, 2 HBA's....

Course - the system was quiet at the time - db down...etc....no other I/O what so ever...

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
David Poe_2
Advisor

Re: Fastest Copy Method

We are using PowerPath, and have 4 zoned paths (dual HBA's) to the Clariion. Speedwise, we aren't outpacing the Clariion yet. We are just starting our migration to the Itanium servers as our older PA-RISC just aren't cutting it.

Everything we have is on RAID 10 as I can't get anyone to sign-off on moving some of the less critical data to RAID 5. At least it isn't my money. :)
Hein van den Heuvel
Honored Contributor

Re: Fastest Copy Method


Just some observations / re-enforcements.

Copy and and Fbackup have the advantage of knowing how much real data there is. dd has no choice but to copy all blocks, used or not. In your case dd has to do almost twice as much IO. With that it might even get to nooks and crannies that have laid dormant before. Specifically, if that had been an EVA then the output may only have been 'promissed', not actively allocated. So 'dd' may cause that one-time additional activity. Dunno about a Clarion.

Copying millions of small files is really not comparable with a few large files. File based copiers will slow down as the number of files increase. dd will be oblivious to that.

dd is contraint be being synchroneous: read a block, write a block. Repeat untill done. Little or no cpu time, but lots of waiting for io to be done. dd becomes vary useful if (and only if?) you can launch multiple concurrent streams. This is notbaly great if you have multipel luns. Depending on the io sus-system characteristics I would even consider splicing a lun with iseek/oseek to copy your rvol in 3 - 10 parallel chunks. Of course this requires the lun not to have been carved from a single disk, but from a large group (ala EVA).

fwiw,

Hein.
David Poe_2
Advisor

Re: Fastest Copy Method

I found another advantage to dd over cp, btw. I noticed that cp seemed took nearly 100% of disk I/O (via glance). dd only took about 15%, meaning that there is plenty of bandwidth to move multiple streams, if you are copying multiple rLV's. Which may be what all you guys were trying to tell me, but I just got. :)

I did notice in the man pages that you can use iseek and oseek to start your copy (both infile and outfile) at different points throughout the rLV. Does that mean that I could potentially break up a single rLV copy into say, 3 dd streams? This would be for fast copies of single, large rLV's. Something like the following.

There are 75000, 1024K blocks in my file system.

1) dd bs=1024k if=/dev/santest1/rlvtest of=/dev/santest2/rlvtest iseek=0 oseek=0 count=24999

2) dd bs=1024k if=/dev/santest1/rlvtest of=/dev/santest2/rlvtest iseek=25000 oseek=25000 count=24999

3) dd bs=1024k if=/dev/santest1/rlvtest of=/dev/santest2/rlvtest iseek=50000 oseek=50000

count intentionally left blank for #3 as I hope this means it will copy the rest.
David Poe_2
Advisor

Re: Fastest Copy Method

I ended up testing the above 3 streams of dd. I also tested one with 5 streams. I found that I was able to match the speed of my cp once I was moving significantly more data than buffer cache could hold for the cp. 3 streams saturated the IO for the server so as expected the 5 streams did no better. I would imagine that the higher class of Itanium servers I'll be doing this on will take more than 3 streams of data. Thanks everybody for your help and suggestions!
Estanislao Ferrater
New Member

Re: Fastest Copy Method

Hi!

... and how about putting new disks into old VG, mirror volumes, and detach all old disks?? At the end increase new lvols to desired size... all without stopping Applications... ;o)

David Poe_2
Advisor

Re: Fastest Copy Method

The problem is that the VG's were improperly sized and we are now hitting those size limitations. Thank you for your suggestion though!
David Poe_2
Advisor

Re: Fastest Copy Method

Please see my post "Apr 7, 2005 18:09:47 GMT" above. Thanks again!