Re: Finding Resource bottlenecks

Sumanth N · ‎08-08-2006

Hi,

We have a 2 CPU and 8GIg HPUX11.11 box.I have an application which is running very slowly than usual. I need to see if there are any bottle necks(CPU,memory,disk,n/w).
So which variables (metrics) should be monitored to check for the bottle necks.

Also, I could see in GLANCEPLUS that the app which is running slowly is in wait reason PRI,
The CPU usage never goes beyond 90% for this application.No page outs happening.

Thanks in advance.

Chan 007 · ‎08-08-2006

Hi,

1. Try top instead of glance, as glance needs few more resources than top.

2. What does sar says,

3. Try to check sar, iostat, vmstat.

4. check for dmesg and syslog for any errors.

5. check for any NFS handing

6. Has the application patched or upgraded recently?

Post output of no 3.

Chan

RAC_1 · ‎08-08-2006

Do you see any of the resources hitting high in glance-cpu, memory,swap and network. Do you see any run queue and priority run queue (very imp) high? (priority run queue metric-gbl_pri_queue)

There is no substitute to HARDWORK

Ludovic Derlyn · ‎08-08-2006

hi,

downlaod and execute system_perf.sh .it's a SEP script

for check bootleneck , execute sar -d 2 50 for example and look if avwait and avserv is superior to 20 ms

What is your application ?
what is the option of lvol (mkfs -F vxfs -m /dev/vgxx/lvolx or ftyp)?

for process , it's important to check also CS switch and forced cswitch

regards
L-DERLYN

Jeff Schussele · ‎08-09-2006

Hi,

Here's a great whitepaper on HP internals & performance from one of HP's best performance analysts - Stephen Ciullo.
A definite must-read.

http://h21007.www2.hp.com/dspp/files/unprotected/devresource/Docs/TechPapers/UXPerfCookBook.pdf

HTH,
Jeff

PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Bill Hassell · ‎08-09-2006

Start by looking at the disk performance in glance. The second page in the 'd' page has the buffer cache stats. Are they over 90% or much lower? Then use gpm (or sar) to look at the disk queue lengths. When the queue length starts climbing over 1 or 2 for long periods, you've got a disk bottleneck. Of course, fixing disks that are not fast enough is quite complicated and expensive (ie, switch to fibre, change to a big array with many gigabyes of cache, etc).

top isn't very useful in that it shows nothing but the highest CPU usage. If one process is accumualting a lot of CPU, it may be normal but only the author of the program can tell. Glance can tell you a lot about a specific program like how much time is spent in individual system calls.

And of course look at LAN statistics in Glance or gpm. If they are in the thousands of packets/sec, you've got a lot of packet traffic, not to be confused with LAN throughput. If you have hundreds of users connecting through telnet and running vi or filling out menus, every keystroke is 2 (very small) LAN packets.

Bill Hassell, sysadmin

Sumanth N · ‎08-09-2006

Hello All
Thanks for all the response and suggestions.
I suspect that some of the 'slow' application might is waits mostly on wait reason PRI. I checked the white paper by Stepen, which says that there might be CPU bottle neck.

How to determine which process is executing on CPU while the app is in wait PRI.
I could not get the output of global run queue from galnce,
Please let me know how to get it.

Regarding swap,
Swap utilization is around 59 - 60%
Swap Available: 18487m Swap Used: 5327mb Swap Util (%): 57 Reserved: 10529m

Reagrding buffer cache,
read hits are always 99-100% and write hits are around 57%

Memory utilization,
Total VM : 10.4gb Sys Mem : 1.7gb User Mem: 5.3gb Phys Mem: 8.0gb
Active VM: 8.4gb Buf Cache: 654mb Free Mem: 426mb

Pagouts are happening total is 529.(cum)
deactivations is 528(cum).

overall stat,

CPU Util S SRU U | 66% 68% 100%

Disk Util FV | 3% 5% 44%

Mem Util S SU UB B | 95% 96% 100%

Swap Util U UR R | 57% 57% 59%

=========================================
sar output
14:47:59 device %busy avque r+w/s blks/s avwait avserv
14:48:01 c0t6d0 26.00 0.50 46 136 0.00 9.63
c2t6d0 23.00 0.50 43 118 0.00 7.28
c3t0d1 0.50 0.50 2 24 0.00 2.61
c5t0d6 0.50 0.50 2 32 0.00 2.44
14:48:03 c0t6d0 1.49 0.50 2 23 0.00 8.92
c2t6d0 0.50 0.50 1 19 0.00 5.38
c3t0d1 3.47 0.50 19 333 0.00 2.01
c5t0d6 0.50 0.50 1 24 0.00 2.59
c3t0d7 0.50 0.50 0 8 0.00 2.53
14:48:05 c0t6d0 2.01 0.50 3 14 0.00 14.28
c2t6d0 1.01 0.50 2 10 0.00 5.22
c3t0d1 1.51 0.50 8 129 0.00 2.46
c3t0d7 0.50 0.50 1 16 0.01 5.51
14:48:07 c0t6d0 3.48 0.50 6 183 0.00 7.59
c2t6d0 1.99 0.50 4 165 0.00 4.68
c3t0d0 0.50 0.50 1 24 0.00 2.49
c3t0d1 1.00 0.50 3 56 0.00 2.81
c5t0d6 0.50 0.50 2 31 0.00 2.57
14:48:09 c0t6d0 2.01 0.50 3 36 0.00 12.79
c2t6d0 1.01 0.50 2 32 0.00 4.81
c3t0d1 1.51 0.50 5 65 0.00 2.56
c5t0d6 0.50 0.50 2 27 0.00 2.40
c3t0d7 0.50 0.50 1 8 0.01 8.75

Average c0t6d0 6.99 0.50 12 79 0.00 9.73
Average c2t6d0 5.49 0.50 10 69 0.00 6.87
Average c3t0d1 1.60 0.50 7 122 0.00 2.28
Average c5t0d6 0.40 0.50 1 23 0.00 2.49
Average c3t0d7 0.30 0.50 0 6 0.00 5.58
Average c3t0d0 0.10 0.50 0 5 0.00 2.49
#
=============================================

memory page faults
avm free re at pi po fr de sr in sy cs
2247368 109810 42 11 2 1 0 0 29 1363 23340 1940
CPU
cpu procs
us sy id r b w
19 5 76 11 0 0
18 5 77

Sorry for this big post.

Thanks once again for all the help.

Steven E. Protter · ‎08-09-2006

Shalom,

Okay guys, if you refer to the SEP script, please give a location.

http://www.hpux.ws/system.perf.sh

The script is useful from two standpoints:
1) It has less impact on the system than glance, thus less distortion in the reports.
2) It can umtimately be modified and enhanced.

Note: The script was originally given to me by HP in response to a performance question. I've made several minor enhancements to it and a few bug fixes.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Bill Hassell · ‎08-10-2006

Well, the first area of concern is swap utilization and pageout/deactivations. PRI as a wait state is completely normal. You c an use uptime or top to see the global runqueue. Since you have 2 CPUs, the runqueue is full when the it reads 2, but if uptime reports 3 or more for extended periods, your applications require more CPUs to run at full speed. The context swicth rate will confirm this. When 5 processes are ready to run, they will be run in the available CPUs, but 3 of the processes will be waiting on a time slice (100ms) or a program that calls the kernel for I/O or system call.

Now if your applications use memory mapped files extensively, the swap usage may be normal but I would look at the pageout and deactivation rate again. It indicates a lack of memory. The vmstat command will report the current po rate.

Bill Hassell, sysadmin

marvik · ‎08-22-2006

Sumanth,

For getting the Global_PRI_queue u can get that from perfview.Need to install MWA agent on ur box and can then collect data and see historic data

Cheers
Vikram

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Finding Resource bottlenecks

Finding Resource bottlenecks