Operating System - HP-UX
1832606 Members
2096 Online
110043 Solutions
New Discussion

adding only 2x the CPUs gave over 4x the performance?

 
SOLVED
Go to solution
Marc Ahrendt
Super Advisor

adding only 2x the CPUs gave over 4x the performance?

running 11.00 on L2000s with either 2 or 4 440MHZ CPUs (all L2000s identical except CPU count)

on our 2 CPU L2000s we can only run about 6 of our custom java "GUIs" before the load avaerages at ~1

on our 4 CPU L2000s we can run about 22 before the load gets to ~1

why such a dramtic increase? we expected the 4 CPU systems to only get ~12 GUIs not ~22!?!

has anyone done a basic bench-mark of multi-threaded Java apps on HP PA-Risc computers varying the number of CPU's and found that performance is significantly better with more CPUs than was expected?

not sure what to look at in glance nor what kernel parameters have the most impact on Java

not sure if i need to buy L3000s (with, i was told, faster bus access to RAM) with more CPU power or tweak my L2000s
hola
16 REPLIES 16
Wodisch
Honored Contributor

Re: adding only 2x the CPUs gave over 4x the performance?

Hi Marc,

actually I do suspect som edifferences between those L's :-)

- kernel parameters
- size and contents of the UNIX buffer cache
- LAN interfaces, esp. auto-negotiation

FWIW,
Wodisch
Brian M Rawlings
Honored Contributor

Re: adding only 2x the CPUs gave over 4x the performance?

Interesting. Since you haven't mentioned it, I'm assuming that they have identical memory. Java is a serious memory hog, if they were different, that would be a big reason.

One other thing occurs to me -- with two CPUs, one of them is spending a lot of time running the OS (most of which is not multi-threaded). This means that your additional CPUs are "all CPU", none of it stolen for normal OS processing.

Depending on how busy Java keeps your OS, this could explain the delta you are seeing.

FYI, the L3000 (AKA RP5470) does have faster internal buses for CPU/memory. Whether or not it would help much is hard to tell (it depends on where you are bottlenecking). I would carefully look at your several upgrade possibilities before spending money on an upgrade to the L3000, which is technically obsolete (but lives on in its new incarnation). Upgrade to RP5470, if you upgrade.

The place where RP5470 would really help you (if you need the additional juice) is that it supports CPUs in the 750MHz and 875MHz range... and they really put out.

Regards, --bmr
We must indeed all hang together, or, most assuredly, we shall all hang separately. (Benjamin Franklin)
Steven E. Protter
Exalted Contributor

Re: adding only 2x the CPUs gave over 4x the performance?

I kind of agree with Brian's post. Based on setup there obviously were/are cpu bottlenecks on your 2 processor systems.

Based on results and all other factors being equal a huge performance bump can be obtained by merely getting another pair of cpu's for the 2 cpu boxes.

You probably already know you have to get them in pairs. Time to get the proposal ready for the next budget cycle.

Your 440 Megahertz rp4550 are useing the B backplane. going to 540 Megahertz machines or higher use the more advanced, faster C backplane. This too will provide a boost.

P
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Marc Ahrendt
Super Advisor

Re: adding only 2x the CPUs gave over 4x the performance?

wodisch: both these L2000s (rp5450) are identical (kernel/software/patches/etc..) except one has 2 CPUs (1 proc. module) and the other 4 CPUs (2 proc. modules) ...that and their host/IP info are the only differences

brian: how can i tell on the 2 CPU system what CPU is running the java code? ...like to validate your point on one CPU running the system's processes. also, these systems have 4GB of RAM and ,yes ,java "is" sadly a memory hog.

steve: yes, i think the L3000 (rp5470) will help out due to the faster "C" backplane, but i cannot say to my boss that i know this is the problem cuz i do not know how to "see" this bottleneck


disk I/O is very low, memory usage is ~97% but no active swaping, and CPU usage is ~60% ...when the 2 CPU system gets to a load of 5 with ~13 "GUIs"

when the load on the 2 CPU system gets to about 5 i go into glance and cannot seem to find where the bottleneck is!?! (i am not a pro with glance nor kernel behavior) but only guessing that the "B" backplane may be the bottleneck (CPU access to the heavily used RAM becuase of Java) as that is what vendors are telling me
hola
Ian Dennison_1
Honored Contributor

Re: adding only 2x the CPUs gave over 4x the performance?

Marc,

Perhaps the system is able to spend more time actually working on the requests and less time managing itself.

What was the 'sys%' like on the old Server compared to the new?

An oldie but a goodie is timeslice being equal to 1, not 10 as it should be - can you check this on the old server?

Share and Enjoy! Ian
Building a dumber user
Bill McNAMARA_1
Honored Contributor

Re: adding only 2x the CPUs gave over 4x the performance?

My java applications show a linear increase per cpu added.

I'd love to have exponential like you!

I suspect you have processes/daemons running on one and not the other?

kmtune -l
please + attach!

Later,
Bill

also,
sar -o /tmp/sarfile 2 1000
2 = interval between sar collection
1000 = number of samples to record
=> total time = 2000 seconds (~30 minutes)
sar -Af /tmp/sarfile > /tmp/sar.report
It works for me (tm)
Volker Borowski
Honored Contributor

Re: adding only 2x the CPUs gave over 4x the performance?

Hi Marc,

what else is running ? Usually you have a database behind a java application, so beside the OpSys stuff, you might have reduced contextswitching for your database as well, giving you the impression of a faster java.

in addition I'd like to know how many percent of your 100% 2CPU load acutally did belong to java. If this had been around 40% before (because rest would have been database and OS) and none of the other parts does now need additional CPU, maths would be i.E. like this

java CPU before = 40% of 2 = 0,8
java CPU after = 40% of 2 + 100% of 2 = 2,8
Which would be a factor of 2,8 / 0,8 = 3,5

Now (22 after) / (6 before) is 3,6 !?

But this is just specualting
Volker
rick jones
Honored Contributor

Re: adding only 2x the CPUs gave over 4x the performance?

just a minor nit - the HP-UX 11 kernel scales quite nicely, i'm not sure it is accurate at all to say that it would run on only one CPU.

iirc, there have been times when load averages reported on a system are not indicitive of the the actual "load" on the system.

as for memory busses and such. short of hardware performance counters, when a memory bus of a system becomes saturated, things like cache misses take longer and longer to satisfy - this leads to the CPUs appearing (to software like Glance) as being 100% utilized.

there is a "use at your own risk, unsupported" tool called "pi" (processor information) that can be used to retrieve various CPU performance counters. it is at ftp://ftp.cup.hp.com/dist/networking/tools/

I would first just run it without args (iric) to have it spit-out information about the CPUs themselves. Then you might start looking at things like dcache miss rates and icache and all that sort of stuff. Of course, if that is all sounding like an alien language, pi may not provide much understandable stuff to you.

another way to get specific info about the CPUs would be to interrupt the boot sequenc and do a "CR" (chip revision) command on each system. I think that the online diagnostics can do similar things (stm/mstm etc) - but those won't access the performance counters.
there is no rest for the wicked yet the virtuous have no pillows
Marc Ahrendt
Super Advisor

Re: adding only 2x the CPUs gave over 4x the performance?

to all: these two systems are "indentical" in every way except one is a 2way and the other a 4way. they only run system processes and java code (no database)

ian: both systems have shown in "top" and in "glance -> c" a sys% never > 10%, and the timeslice is the default

bill: attached is the kernel parameters used by both systems ...and sar output you asked for is in the next attachment

volker: i like your math breakdown ...how can i confirm what you did? i'd like to know how to add up the CPU% of all the java processes? use top?

rick: that "pi" command would be a little over my head ...not sure i could understand the output, but thx for the tips (just to let you know, disk utilization is always < 20%)

hola
Marc Ahrendt
Super Advisor

Re: adding only 2x the CPUs gave over 4x the performance?

sar output too large to send as an attachment

in summary:
HP-UX locke B.11.00 U 9000/800 01/31/03

13:58:57 %usr %sys %wio %idle
Average 8 1 1 90

13:58:57 device %busy avque r+w/s blks/s avwait avserv
14:00:29 c2t0d0 0.50 0.50 1 10 4.21 7.10
c1t2d0 0.50 0.50 1 2 0.42 11.22
c1t0d0 0.50 0.50 1 8 8.20 2.87
c2t2d0 0.50 0.50 1 2 0.42 16.26
hola
Brian M Rawlings
Honored Contributor

Re: adding only 2x the CPUs gave over 4x the performance?

Rick: I didn't mean to imply that the kernel doesn't scale well, 11.0 and 11i do it very well. It's more how OS-resource-to-app ratios are impacted by additional CPUs. With a small CPU count, the minimum OS activity is a fairly large percentage of what goes on (IMHO).

Volker's brief but excellent analysis is more what I was getting at. If the apps keep the OS busy (scheduler, process & mem control, etc), the OS can take an appreciable percentage of a 1- or 2-CPU system, but even with increased activity will not take very much more of a 4-CPU box, leaving lots more horsepower for apps than the small count box.

In Volker's quick & dirty look, the numbers match up to real life surprisingly well, and validate my point (which I didn't make very clearly, thanks for saying it better, Volker!)

We've all seen Glance and sar get fooled by memory contention or sun spots (hard to prove which it is, sometimes), so, the proof is, as they say, in the pudding. The surprisingly good results for 4 CPUs are not what I would have guessed at, nor would typical analysis suggested it, but... there it is. We have a credible guess as to why it worked out this way... that's my story, and I'm sticking to it.

Regards, Interesting discussion, TTFN. --bmr
We must indeed all hang together, or, most assuredly, we shall all hang separately. (Benjamin Franklin)
rick jones
Honored Contributor

Re: adding only 2x the CPUs gave over 4x the performance?


while pi may be a bit over the top, i would still be curious to know more about the CPU ID info. so, if mstm is installed, or if the boxes can be rebooted to do that CR command at the firmware prompt...

it might also be interesting (if it is possible) to take the four CPU system and reboot it with two of the four CPUs disabled. if the performance is then still higher, simple CPU count can be removed as the reason.
there is no rest for the wicked yet the virtuous have no pillows
Jeff Schussele
Honored Contributor

Re: adding only 2x the CPUs gave over 4x the performance?

Hey Brian,

TTFN ?!?!?
Wouldn't CUL8R be somewhat more professional? ;~)

Just teasin,
Jeff

P.S. Volkers figures do work nicely, huh?
Have a good weekend.
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Volker Borowski
Honored Contributor
Solution

Re: adding only 2x the CPUs gave over 4x the performance?

:-)
Thanks folks, for beinig so kind about my small calculating example. To be honest I assumed the 40% by reverse-calculating the 22/6 ratio :-)
To add up cpu and check for other processes, "top" would be indeed a good choice, keeping in mind that overall system CPU percentage for "top" is 100% * num-CPU.

But I have to say, that with just java and OS involved, I doubt that the OS is eating up 60% of CPU. In this case, I would suspect the system is paging all the time.

This is why I asked for a possible running database on the same box. Esp. if this would be an Oracle database running with dedicated serverprocesses. Nearly the same would apply for SAP-DB, but I do not know about My-Sql.

Volker
Marc Ahrendt
Super Advisor

Re: adding only 2x the CPUs gave over 4x the performance?

thx 4 all the help

i think the bottom line is the CPU utilization as described above. when i disabled 2 CPUs on the 4way it acted like the other 2way. also summing up the CPU % on top matched relatively close to Volker's math. so my next system will be a 4way AND i will go for a L3000 instead of a L2000 (get that faster C-backplane and faster CPUs).

fyi: the java apps we are running are very much memory hogs (almost all 4GB of RAM are being used) ....which may be the reason behind the OS making good use of the the CPU resources on the 2way)

thx again to all for the feedback
hola
harry d brown jr
Honored Contributor

Re: adding only 2x the CPUs gave over 4x the performance?

Marc,

You might want to check the JAVA patch list to hopefully "cure" those memory "hogs":

http://www.hp.com/products1/unix/java/infolibrary/patches.html

live free or die
harry
Live Free or Die