Operating System - HP-UX
1832978 Members
2757 Online
110048 Solutions
New Discussion

Performance Problem (cont)

 
SOLVED
Go to solution
David Child_1
Honored Contributor

Performance Problem (cont)

I am having a performance problem on one of my V-class 2250s (16 CPUs/16GB phy mem).

What I am having the most trouble understanding/explaining is that the the CPUs are ~50-60% idle. System time is high (~35%) compared to user time (%3). The server is very unresponsive (obviously). There is a very high run-queue as well. I am just trying to find out what else I might look at as to why the run queue is so full and the high system time. The server doesn't appear to be swapping or waiting on I/O. See attached for some system print outs.

Any help would be greatly appreciated.

Here are some kernel params:
dbc_min_pct = 5
dbc_max_pct = 10
minfree, lotsfree, desfree = 0
maxdsiz, maxdsiz_64bit = 0x040000000
maxssiz, maxssiz_64bit = 0x01000000
maxtsiz, maxtsiz_64bit = 0x010000000

The system was patched about 8 months ago so I'm a little behind there.

Running HP-UX 11.00
Thanks,
David
7 REPLIES 7
James R. Ferguson
Acclaimed Contributor

Re: Performance Problem (cont)

Hi David:

One kernel parameter that would give high run-queue with low processor would be a bad 'timeslice' value. It should be set at <10> for almost all applications. The lower the value the more forced context-switching occurs leading to a deepening run-queue and little real work.

Regards!

...JRF...
A. Clay Stephenson
Acclaimed Contributor

Re: Performance Problem (cont)

If my poor brain is functioning, I seem to remember that I suggested that you look at timeslice months ago. That could be your culprit. It would probably be helpful if you posted a few things:

1. kmtune output

B. sysdef output

III. A Glance output showing the system calls activity

If we can see what system calls are eating the CPU then we may get a handle on your problem.



If it ain't broke, I can fix that.
harry d brown jr
Honored Contributor

Re: Performance Problem (cont)

Was your previous performance issue http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0xb829eea29889d611abdb0090277a778c,00.html have to do with the same server??

10% of 16GB is 1.6GB of cache. Usually thats about 4 times more than necessary. What's cache your hit rates?

How many processes are running?

Do you have ems activated?

live free or die
harry
Live Free or Die
David Child_1
Honored Contributor

Re: Performance Problem (cont)

Thanks for the replies.

James and Clay, the timeslice is set for '10'.

Clay and Harry, yes you are correct. I was having this issue on this server a while back. There were a few changes suggested at that time that implemented, unfortunately they did not have much affect. I ended up having to renice some of the processes so that others could finish up. This seemed to clear it up okay. Its happened twice since then. I have not been able to identify any processes that only run at those times, but I did notice that there appear to be more of them (various processes). I have attached the output from kmtune and sysdef and glance. It looks like a lot of time is being spent on open/close calls.

Harry, currently there are 892 processes running. I have attached the output from 'sar -d' as well as kmtune and sysdef. Yes 'ems' is activated. This server use to have 8GB of memory, but we decommisioned another V-class server a couple of weeks back and took 8GB from that server and added it to this. I haven't gone back in to lower dbc_max_pct.

Once again thanks.

David

A. Clay Stephenson
Acclaimed Contributor
Solution

Re: Performance Problem (cont)

Okay David,

You can lower dbc_max_pct but that won't help because you have bufpages set to a non-zero value (nbuf * 2) and nbuf = 248269. This means that you have 248269 * 2 4K pages allocated as buffer cache (just under 2GB). 11.0 systems very seldom benefit from anything over about 800MB of cache and probably 400-500 is better. You should set bufpages to something around 102400 (400MB) and see what that does for you.

You applications do seem to be doing many opens and closes in additions to reads and writes. There is also quite a bit of forks()m, execs(), and pipes() indicating that this machine is spending a lot of time spawn many processes.
If it ain't broke, I can fix that.
Ian Dennison_1
Honored Contributor

Re: Performance Problem (cont)

My 2 cents worth,...

Looking at the System CPU utilisation (35%) and the high Run Queue (100+) says to me that the System is spending too much time managing the System and not enough time running the Application.

Have you made any changes to the Application once the new memory was added? If the Application is multi-processed (like Oracle or SAP) I would look how busy /idle each Application process it, and look at reducing the number of UNIX processes it spawns (Instead of 100 UNIX processes working at 15%, why not 50 UNIX processes working at 30%?) Means less swapping at the OS level, puts the pressure onto the Application to manage its resource. Discussion?

Have looked at the Performance Bible (Sauers and Weygant) and they suggested that semaphore operations were System CPU intensive (sar -m). Unfortunately SAR only documents rate of Semaphore usage, not utilisation.

Overall, I would ask if the system is trying to do too much when in fact quite a lot less is needed (memory, CPUs, etc)?

Share and Enjoy! Ian
Building a dumber user
David Child_1
Honored Contributor

Re: Performance Problem (cont)

Clay; I don't know how I missed the *buf* parameters. I will definately fix that oversight when they let me reboot. I'm not very hopeful that it will fix the issue I am seeing.

Ian; I will look into having the applications looked at and optimized for use with this new memory. Right now I am trying to track down the process(es) that are performing the majority of the open/close/pipes as these are the calls that the CPU is spending the most time on.

Yesterday I 'reniced' a subset of processes so that other jobs might get more CPU time. The system was soon running "okay". By okay I mean that the CPU utilization jumped to 100%, but it was primarily (~85%) user mode. This was encouraging and I had hoped that the run queue would start clearing out. The problem reappeared a few hours later. I have looked at processes that were running when it "okay" and compared them to processes running now and nothing has jumped out as a culprit.

I will keep looking around to see if I can find the process(es).

Thanks,
David