Re: Oracle imp timing difference between 2 HPUX servers

Evelyn Daroga · ‎02-21-2008

I am trying to figure out why an Oracle import is taking nearly twice the time on one system (System A) than another (System B). I have attached a document with configuration information and observations. I would expect System B to be much slower than System A, but actually it is the opposite. Can anyone explain why?

Hein van den Heuvel · ‎02-21-2008

Hmmm....

The details provided in the attachement give a lot of information about CPUs and Memory, even memory layout, and then go on to suggest:

"No CPU bottlenecking"
"100% I/O"

Does that not suggest that CPU and Memory details are entirely irrelevant for now?

Should the analysis not focus on the IO subsystem (differences)?

How many drives? What configuration?
VA? EVA? XP? Direct connect?
Filesystem or Raw? LVM configuration?

And if the the import is not already on a local disk but piped in from a network (NFS), then the raw network throughput also needs to be compared.

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting

Evelyn Daroga · ‎02-21-2008

Thank you for the quick reply.

Both systems are utilizing disks on the same SAN. All are striped across all disks in the SAN. Both systems are connected to the SAN using the same FC card and cable. On the SAN side, each has a virtual disk configured for the db's. System A's device is Raid-10, while System B's is Raid-5 -- another reason why I would expect System B to be slower, not faster.
On the Unix side, LVM is used to create the file systems. Each db is on its own file system, which is part of a volume group that contains other db's. Each vg contains the same db's (System B is basically a copy of System A).

The dmp file that is being imported has resided on a windows share. Both imports were from the same dmp file, from the same location (remote to each unix). Both unix systems were on the same switch. We tried moving one to another switch with no noticeable difference.

Hein van den Heuvel · ‎02-21-2008

Now it is becoming more fun!

How about some 'baseline' measurements?

Create a 1 .. 4 GB file on each of the servers:

#cd

#time dd if=/dev/zero of=3gb.tmp bs=1024k count=3000

Compare times.

Now transfer one of the file to the Windoze server. Time that!
Remove both 3gb.tmp files.

Now transfer, and time, pulling that 3gb file back from the windoze server to /dev/nul first,
and to storage next.
On each system...

Any major differences?
It could still be a 'bad cable' in my book.
(Gigabit ethernet is much more sensitive to cable quality than the 100mb stuff).

How about any mount differences..
I'm assuming the same -o were used, but you may want to check that.

hth,
Hein.

Evelyn Daroga · ‎02-21-2008

I did ioscan to see the ethernet hw; it shows:

lan 0 0/0/0/0 btlan3 CLAIMED INTERFACE HP PCI 10/100Base-TX Core
/dev/diag/lan0 /dev/ether0

Does that give you a clue about the cable?

Also, I checked fstab for the mount options on both servers â they are the same: hfs rw,suid,largefiles 0 2

Also, I mis-spoke regarding the SAN layout for these. System B's volume is actually in its own vg (no other db's or volumes in this vg) â and it is Raid 10, not Raid 5 as I thought. I don't think the fact that it's in its own vg would matter, since the import on System A was done off-hours, so there would be little-to-no activity in the other db's in System A's volume group.

I'm working on your timing tests. Will advise when I have some results.

Thanks!

Hein van den Heuvel · ‎02-21-2008

>> lan 0 0/0/0/0 btlan3 CLAIMED INTERFACE HP PCI 10/100Base-TX Core
/dev/diag/lan0 /dev/ether0
>> Does that give you a clue about the cable?

No not really. That just tell us what the adapter is capable of.
You may want to use
/usr/sbin/lanadmin -x 0
or
echo "lan display quit" | lanadmin
to show us the actuals.

However... the maximum speed is going to be 100mbit/sec or just over 10 Megabytes per second.

That should be slower than the slowest disk subsystem I can imagine.
So I would suggest you focus on considering that you are limited my the network speed, not the server or import process.

Was this supposed to be 100mb, or gigabit?
For grins... I'm running gigabit at home with a very nice working $50 Dlink 8 line switch between my PC's and RX2600 server. Of course gigabit might just move down the bottleneck to the disk serving up the dump.

Do you have the space to pump over the dump file first?

Can you divided and concur?... chop the dmp file into 1GB chunks and start importing while getting the next chunk over the network? Of course the dividing up is best done by Oracle itself through the FILESIZE parameter in the EXPort options.

And uh... how serious a problem is this?
Sounds like you are more than just curious.
May we assume this has to be done repeatedly (daily, weekly)?

Or did you observe a slower test transfer and want to get it right for the 'real', bigger import?

http://www.orafaq.com/wiki/Import_Export_FAQ

Good luck!
Hein.

TTr · ‎02-21-2008

lan 0/0/0/0 is the core (built in) LAN of the N servers. It is 100BT not gigabit. Are you using the same 0/0/0/0 on both servers or does the other one have an add-on Gbit NIC?

How big is the DB dump file? And like Hein asked, is the dump file copied locally to the two N servers before the import is run or is it read live during the import.

You need to get more details about the SAN storage of the two servers. RAID10 vs RAID5 is not enough. How many disks are in each RAID group and how many other servers are sharing storage in those same RAID groups? How many connections to the SAN does each N server have?

Do you have Glance running? Check the I/O of the specific disks (hit "u" in the glance screen)

Evelyn Daroga · ‎02-21-2008

Hein,

lanadmin â x 0 shows:

System A: Current Speed = 100 Full-Duplex Auto-Negotiation-OFF
System B: Current Speed = 100 Full-Duplex Auto-Negotiation-OFF

However, System A's setting was just changed last night â previously (during the import testing) it was: 100 Half-Duplex Auto-Negotiation-ON

System B's was changed about a week ago to the current 100FD Auto-Neg Off, in an effort to improve the overall import time. It didn't help.

I'm not sure I understand your question: "Was this supposed to be 100mb, or gigabit?" â I think I'm inclined to say 100Mb, but donâ t really know why.

Yes, we can pump over the dmp file prior to import â in fact we've been sharing dir's between the unix and windows server, so the exp can go straight to the unix dir (we've been piping it thru gzip to reduce the amount of data actually transferring â that has helped a bit). I have not considered splitting the file into chunks. That may be something to think about. Would that speed up the import? The export directly to the shared dir is only about 2 hrs â not that critical. It's the import that's killing us.

How serious? Well, we're trying to fit a couple exports, imports, and a data conversion process into a 48-hour window. We are testing the process now. The final import back into one of the production db's (we have 2, but are testing with the largest one) takes 23 hours. When I tried different exp/imp options on our development server (System B) the time was more like 14 hours. Back on the production server (System A), the identical import took 23 hours again -- I can only test on System A on weekends. I have done it several times on both systems, and System B is consistently faster. We are looking at other options (eg, not exporting/importing the full db â only the tables we really need) to reduce the overall time. But, if we can figure out why System B is so much faster, we would want to implement it on System A for the real thing. So, it is more than just curiosity, but it is not urgent in that I need an answer today.

Quick question on the time testing â when you say 'transfer the 3gb.tmp to the windoze (love it!!) server', are you saying use the dd command, as: dd if=3gb.tmp of=/windows_share ?
And I assume the same thing coming back to /dev/null: dd if=3gb.tmp-from-windows-share of=/dev/null ?

Ttr:
There is no add-on G-bit NIC on either server.
The dmp file is 12Gb and has been accessed remotely from a windows share, from both servers.
As for the SAN, my understanding is that all servers are using all disks â everything is striped across the entire SAN, then divided into virtual devices. So, disks in a Raid group, other servers sharing the storage, etc., should be moot since the configuration there is the same for all. Each N server has a single Fibre Optic connection to the SAN.

Thanks for the help â I appreciate it!

Hein van den Heuvel · ‎02-21-2008

>> lanadmin Ã¢Â Â x 0 shows:

OK.

>> "Was this supposed to be 100mb, or gigabit?" Ã¢Â Â I think I'm inclined to say 100Mb, but donÃ¢Â Â t really know why.

I was just wondering whether the wrong interface was being used. Looks like 100mb is all you have.

>> I have not considered splitting the file into chunks. That may be something to think about. Would that speed up the import?

I can allow for more overlap.
You woudl not have to wait for the full export to be available but start transferring as the files show up.

>> How serious? Well, we're trying to fit a couple exports, imports, and a data conversion process into a 48-hour window.

It may be worth it to get Gigabit installed to save some critical time... if the server you use supports gigabit.
It may also be worth it to engage a consultant.

>> We are testing the process now.

Excellent. Too many folks just hope for the best!

>> The final import back into one of the production db's (we have 2, but are testing with the largest one) takes 23 hours.

That sounds like the Oracle setup could be improved upon. Specifically, I hope you are exporting without indexed and have export create an index creation script. You can than feed that to a task which hands out index created to the next free slave, or just cut it up into chunks and start chunks
as soon as the underlying tables a are imported.

<< you just eceeded thr first 15 minutes free consulting! :-) >>

Seriously, gotta run, can not follow up for a while to come.

>> Quick question on the time testing Ã¢Â Â when you say 'transfer the 3gb.tmp to the windoze (love it!!) server', are you saying use the dd command, as: dd if=3gb.tmp of=/windows_share ?

No, I was thinkg just an transparent NFS or Samba based CP command or FTP, depending how the drive is shared.

>> The dmp file is 12Gb

100 mb/sec = 10 MB/sec.
12 GB = 12,000 MB ---> 1200 seconds = 20 minutes

So the network should not really be a concern after all.

Hope this helps some more,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting

Evelyn Daroga · ‎02-21-2008

Hein,

Well, I did the dd stuff before I read your answer, so here are the results anyway. The windows share (/dvisweb2_dumps) is mounted to each unix via NFS. I have included the commands I executed, and results for each system.

Results:

cd to the directory where the target db files are on each server.
#time dd if=/dev/zero of=3gb.tmp bs=1024k count=3000
Server A Server B
real 2:00.6 1:49.6
user 0.0 0.0
sys 27.7 27.8

#time dd if=3gb.tmp of=/dvisweb2_dumps/3gb.tmp bs=1024k
Server A Server B
real 6:06.4 11:16.6
user 0.1 0.0
sys 54.6 1:25.0

#time dd if=/dvisweb2_dumps/3gb.tmp of=/dev/null bs=1024k
Server A Server B
real 5:03.1 6:56.7
user 0.0 0.0
sys 21.5 42.5

cd to the directory where the target db files are on each server.
#time dd if=/dvisweb2_dumps/3gb.tmp of=3gb.tmp bs=1024k
Server A Server B
real 7:03.1 8:03.8
user 0.0 0.0
sys 42.9 1:05.3

These results appear to be closer to what I would expect -- System B generally took longer to do most of these things. I did these during normal working hours, where Sys A would be more affected than Sys B.

It was pointed out to me that there are full backups that run on the weekends (when I'm testing Server A) for most of the windows servers that use the SAN. Maybe there is simply more activity on the SAN when I'm testing on Server A than when I'm testing on Server B. I'm not convinced that's really the issue,however, for two reasons: 1-backup activity would be more read-intensive, and 2-there would be more read/write activity during working hours, when I've done some of the testing on Server B. Anyway, I'll give that more thought.
Thanks again for the help! It is appreciated!

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Oracle imp timing difference between 2 HPUX servers

Oracle imp timing difference between 2 HPUX servers