1836433 Members
2554 Online
110100 Solutions
New Discussion

Re: Performance Problem

 
David Child_1
Honored Contributor

Performance Problem

I am having a performance problem on one of my V-class 2250s (16 CPUs/8GB phy mem).

Here are some of the basic symptoms (see attached for more details):

1. vmstat shows a high number of processes in the run queue

2. glance shows a high (at least high compared to my other systems) context switch rate

3. the system is using ~2% swap, but memory is at 90% utilization and a big chunk of that is file system buffer

There are about 800 processes running on this server.

Here are some kernel params:
dbc_min_pct = 5
dbc_max_pct = 50
minfree, lotsfree, desfree = 0
maxdsiz, maxdsiz_64bit = 0x040000000
maxssiz, maxssiz_64bit = 0x01000000
maxtsiz, maxtsiz_64bit = 0x010000000

The server was recently patched.

It doesn't appear to be a disk, memory, or CPU bottleneck. Glance shows a network packet rate of:

Packet In = 1401
Packet Out = 1555
No colisions or errors.

My first thought was that there was just too many procs running.

The swap utilization is a little puzzling - I thought that some of the file system buffer space would have been reclaimed for other uses before using swap space.

Any suggestions would be greatly appreciated.

Thanks,
David
13 REPLIES 13
Michael Tully
Honored Contributor

Re: Performance Problem

Hi David,

The first thing to look at would be to reduce the size of your buffer cache. It is *far* too high. The recommended size is somewhere in between 300-400Mb, so in your case with 8GB of RAM, reduce it to 5% (dbc_max_pct) Set the minimum to 2% (dbc_min_pct). Don't be alarmed at the size of the reduction. There are plenty of postings on this issue. Here are three.

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0x166f107d277ad611abdb0090277a778c,00.html
http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0xf49203bbece8d5118ff40090279cd0f9,00.html
http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0x6752a22831ebd5118ff40090279cd0f9,00.html

Michael
Anyone for a Mutiny ?
A. Clay Stephenson
Acclaimed Contributor

Re: Performance Problem

First, I would reduce the max_dbc_pct to no more than about 10% and 5% would probably be better under 11.0. The next thing that I would check, given the high context switch rate and high system vs user ratio, is that timeslice is incorrectly set to 1 rather than 10. There has been a bad tuned paramter set that incorrectly sets this value.
If it ain't broke, I can fix that.
steven Burgess_2
Honored Contributor

Re: Performance Problem

David

I think the issue may be with your
dbc_max_pct = 50 setting

How much memory do you have installed on the server. There are guidlines set regarding this setting

Have a read of the attached

Regards

Steve
take your time and think things through
steven Burgess_2
Honored Contributor

Re: Performance Problem

David

Here's another thread from a question I raised myself regarding the mentioned settings

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0xe07b1cc6003bd6118fff0090279cd0f9,00.html

HTH

Steve
take your time and think things through
David Child_1
Honored Contributor

Re: Performance Problem

Thanks for the quick replies. I didn't really think about the tunables too much since this server has been running great for months. I just started having this problem about a week ago. I didn't realize that the 'dynamic' in dynamic buffer cache wasn't very dynamic.

Just for added information: there are no databases on this server. sar -b shows that there are mostly reads (with 99 - 100% cache hit (I would hope so with so much memory being allocated for it)).

I will try and get an outage to change the tunables. I'll let you know how it turns out (and assign points).

Thanks again,
David
David Child_1
Honored Contributor

Re: Performance Problem

I got an outage for the server and reduced the buffer cache to 10%. Of course the system was working okay after it rebooted, but I cannot yet be sure if the problem has been fixed completely. Some of the same symptoms have already started to appear again (see attached):

1. swapinfo & glance show some swap device being used

2. the CPUs are still spending the majority of their time in system mode

3. the context switch rate still seems a bit high

I am curious why even after lowering the buffer cache a swap device is still showing that it's being used. It's showing exactly what it was yesterday before the kernel parameter change. I did see some threads that mention memory mapped files and I do see some mmap syscalls.

I don't see what I always thought of as traditional CPU, Disk, or Memory bottlenecks. The network is a Gbit and shows no errors or collisions. What else could cause the high amount of CPU system time?

I'm trying to think of other things to check, but I'm coming up short. Any suggestions would be great.

Thanks again,
David
Martin Johnson
Honored Contributor

Re: Performance Problem

You may want to check the setting of kernel parameter timeslice. If it is set to 1, you may want to increase it.

HTH
Marty
David Child_1
Honored Contributor

Re: Performance Problem

Sorry, I forgot to mention that I did check the timeslice setting last night and it's (100/10). I'm not sure why it's a formula, but it comes out to "10".

Thanks,
David
Dave Chamberlin
Trusted Contributor

Re: Performance Problem

When you changed dbc_max_pct - di you make sure bufpages and nbuf were set to 0? They would override your dbc_max_pct if set otherwise.
David Child_1
Honored Contributor

Re: Performance Problem

nbuf = 0
bufpages = (NBUF*2)

Does it actually have to be entered as '0' or will the formula suffice. From looking at the glance outputs it looks like buffer cache is reduced.

Thanks,
David
Sandip Ghosh
Honored Contributor

Re: Performance Problem

Yes, your glance output shows that it has been reduced to considerable amount.
The formula for bufpages is fine. It can take up the value from the formula.

Sandip
Good Luck!!!
steven Burgess_2
Honored Contributor

Re: Performance Problem

David

You can get a small programme called sarcheck
from www.sarcheck.com. The programme produces html reports of performance data on you system highlighting issues with bottlenecks etc.

http://www.sarcheck.com/schp.htm

Have a look

Steve
take your time and think things through
Tony Romero
Advisor

Re: Performance Problem

It could possibly be a network issue. Make sure that the seeting on the network interface matches that of the port on the switch to which it is connected. Half or Full duplex both the NIC and the Port need to be the same.

/usr/sbin/lanadmin -> lan -> display
Freedom!!!