Operating System - HP-UX
1752770 Members
4957 Online
108789 Solutions
New Discussion юеВ

Performance Issue on 11.31

 
Gireesh
Occasional Advisor

Performance Issue on 11.31

Hi All,

We have installed new Blade server BL860C with 11.31.Server is connected to EMC DMX4 storage and native multipathing is enabled.The problem I am facing is while I am doing a simple unix copy of 15g file ,it takes 6 mins and I am getting thruput of 45MB/S where as same file if I am copying in old server ie RP7420 hpux 11.11 it is only taking 3 to 4 min max and thruput is 65MB/s and there avserv is not crossing 4ms but new server is touching 18ms.I would like to know is there any special parameters need to set in OS level to get a high thru put.I am attaching here sar -H and sar -L output from new server while copy happens.
16 REPLIES 16

Re: Performance Issue on 11.31

First things first are to check you have all the correct director bits set on the DMX for 11.31. They are _not_ the same as for 11.11. Speak to your storage team or EMC to check this out.

So what sort of MPIO were you using on the 11.11 host? PowerPath? LVM PVlinks? nothing at all?

I ask because some disk arrays (and I can't comment on whether the DMX4 is one of those), don't do very well at handling sequential IO when those IOs arrive via differemt ports on the array. One big copy operation will generate just sequential IO.

If the 11.11 system is using PVlinks then actually all the IO will be going to just the one port on the DMX, whereas the default with the 11.31 MPIO will I expect be round-robin. If the DMX can't detect and optimise when sequential IO arrives down different ports, it could well end up being slower.

If this is the case you might want to change the load balancing policy - see this WP for how:

http://docs.hp.com/en/native-multi-pathing/native_multipathing_wp.pdf

I'd imagine that "preferred path" or "weighted round robin" might give better performance for sequential IO (if what you really want is better sequential IO performance - this might make random IO worse)

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Gireesh
Occasional Advisor

Re: Performance Issue on 11.31

Thanks Duncan,
Storage team done all EMC suggested bit for 11.31.
In 11.11 we are using PV links.I have tried to change the policy one by one but no net result .If I monitor my FC t-put I am getting 46MB/s for the first time copy but if I do the same copy second time it will shootup to 80MB/s.
I have noticed one more thing ,while I am using dd command ,server is giving good thru put of 70MB/s.I am not sure how it is.

dd if=/dev/rdsk/c14t9d7 of=/dev/rdsk/c16t9d6 bs=1024k count=500000.

Re: Performance Issue on 11.31

>> If I monitor my FC t-put I am getting 46MB/s for the first time copy but if I do the same copy second time it will shootup to 80MB/s.

So probably a result of your filesystem buffer cache - maybe you have different settings on the filesystems you are using on 11.11 vs. 11.31 - compare using "mount -p" on both systems.

>> I have noticed one more thing ,while I am using dd command ,server is giving good thru put of 70MB/s.I am not sure how it is.

If you are using dd to the raw device you are again avoiding biffer cache, so this backs up that the effect you are seeing may not be disk related. What performance do you get if you run the raw dd on the 11.11 system?

HTH

Duncan

I am an HPE Employee
Accept or Kudo

Re: Performance Issue on 11.31

>> dd if=/dev/rdsk/c14t9d7 of=/dev/rdsk/c16t9d6 bs=1024k count=500000.

Also be awrae that unless you have set the load balancing to something different, then using the legacy device files will still cause round robin'ing between all paths to the disk.

You should figure out which agile DSF this legacy DSF is assocaited using "ioscan -m dsf", then post the output of:

scsimgr get_info -D /dev/rdisk/diskNN

replacing diskNN with the agile DSF you indentifed.

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Gireesh
Occasional Advisor

Re: Performance Issue on 11.31

I have checked the buffer cache setting and didn't find any change between two servers.

I am attaching scsimgr info here.
Michael Steele_2
Honored Contributor

Re: Performance Issue on 11.31

Hi

You're comparing a low end introductory BL860C against a RP7420 and your pissed because the blade isn't as fast as the high end super computer.

The RP7420 is going to out do the blade 4 to one in billions of processor operations per second. See below.

You gave up the horse power when you went low end econo class blade.

Computer
(Full Precision) HP 9000 rp7420-16 (1000MHz PA-8800)

/ HP Integrity BL860c (1.6GHz/18MB Dual-Core Itanium 2)

Number
of Procs
or Cores 16 / 4

Rmax
GFlop/s 47.5 / 24.48

Nmax
Order 30600 / 34920

N1/2
Order 1020 / 560

RPeak
GFlop/s 64 / 25.6


http://www.netlib.org/benchmark/performance.pdf

Note: You look at your sar report and you have nothing approaching a disk bottleneck. Absolutely no wait time what so ever. It's not in you I/O.
Support Fatherhood - Stop Family Law
Dennis Handly
Acclaimed Contributor

Re: Performance Issue on 11.31

>Michael: You're comparing a low end introductory BL860C against a RP7420 and your pissed because the blade isn't as fast as the high end super computer.

(The rp7420-16 is only midrange, with up to 16 cores.)

>The RP7420 is going to out do the blade 4 to one in billions of processor operations per second.

Provided you can keep all cores busy.
Otherwise provided you don't need all 128 vs 48 Gb, a single core on BL860c will beat the rp7420.

It seems like this test case should be I/O bound but sar(1m) doesn't show it.

What does glance show?
Michael Steele_2
Honored Contributor

Re: Performance Issue on 11.31

Dennis

RE: "...(The rp7420-16 is only midrange, with up to 16 cores.)..."

You mean the high end superdome that starts with 1 to 16 cores, or the mid range rx8420 thats 2 - 16 cores? Or the rx7420 that 2 to 8 cores?

Which one Dennis? Or you just trying to be arguementative?

"...Provided you can keep all cores busy.
Otherwise provided you don't need all 128 vs 48 Gb, a single core on BL860c will beat the rp7420...."

Read the test material Dennis.

"...It seems like this test case should be I/O bound but sar(1m) doesn't show it...."

Dennis, if you don't know what 100% busy, or 0.00% avwait time is then maybe you shouldn't be answering this question.
Support Fatherhood - Stop Family Law
Steven E. Protter
Exalted Contributor

Re: Performance Issue on 11.31

Shalom,

I would like to add that it is very difficult to compare these two systems.

An rp7420 has PA-RISC architecture and as noted, is a certainly not a low end system.

The blade has more modern processors, but has a completely different processor and back plane technology. So many things are going to be different, I/O capacity for example, even though the blade has a more up to date processor.

The original performance comparison is not realistic either.

A performance problem is real when applications show slow response time. Usually the end user provides the start of investigation with a complaint the system is slow. Your original file copy data does not surprise me.

If there is a user complaint, then its worth looking into and you may find the blade is not adequate for the task you have assigned it.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com