Operating System - HP-UX
1825820 Members
2655 Online
109688 Solutions
New Discussion

High RunQueueLength, yet CPU is idle

 
David Connolly
Regular Advisor

High RunQueueLength, yet CPU is idle

Hi All

I have an 11.11 system which occasionally "hangs" for maybe 30 seconds to 1 minute. It is not regular, nor can I find which process is causing the issue. Top reports 95% CPU idle. My monitoring software (Firehunter) shows that the RunQueueLength is unusually high (9 in a 5 minute average) at the affected times, yet CPU Idle is still above 95%

Any help would be appreciated.

Dave
6 REPLIES 6
Mark Grant
Honored Contributor

Re: High RunQueueLength, yet CPU is idle

Could be all sorts of things. I would get some solid stats from glance, vmstat, iostat or sar and see where your bottleneck is. All we know is it isn't the CPU.
Never preceed any demonstration with anything more predictive than "watch this"
Pete Randall
Outstanding Contributor

Re: High RunQueueLength, yet CPU is idle

Dave,

Take a look at your network stats. It sounds like maybe the machine isn't hung, but your perception is that it is.


Pete

Pete
Paula J Frazer-Campbell
Honored Contributor

Re: High RunQueueLength, yet CPU is idle

Dave

Whilst the cpu can be quiet the rest of the server can be very busy - I would first check your disk activity Glance/sar then network activity.

My initial guess is the root disk is very busy followed by the data disks.

Paula
If you can spell SysAdmin then you is one - anon
Bill Hassell
Honored Contributor

Re: High RunQueueLength, yet CPU is idle

The runQueue length (same value as uptime) is a measurement of all the running and waiting-to-be-run processes. Since it is measured over a long period of time and counts all processes at the same time, the true workload is not very meaningful. In a 4-CPU system a runqueue of 4 means all 4 processors are busy, while 8 means that 4 CPUs are busy and on average, 4 processes are waiting for a CPU. Since timeslice is normally, 100ms, there could be a context switch 10 times per second.

During the problem periods, you're going to need a measurement tool that is significantly more sophisticated than ps or uptime. MeasureWare and Glance will help but it's going to take some work to track down the culprit(s). My giess is that some process gets started that may cause a huge number of other short-lived processes to run. This can be verified by noting big jumps in the PID numbers for new processes.


Bill Hassell, sysadmin
Laurent Menase
Honored Contributor

Re: High RunQueueLength, yet CPU is idle

Hi Dave,

What is the process on top with WCPU% and CPU big at that time?
for instance if it is vhand, the system is running out of free memory and swaps.
David Connolly
Regular Advisor

Re: High RunQueueLength, yet CPU is idle

Thanks for the assistance, folks.

Paula: The sar results below would agree with you:
HP-UX ilife332 B.11.11 U 9000/800 01/09/04

14:00:00 %usr %sys %wio %idle
14:10:00 5 0 0 94
14:20:00 11 1 0 88
14:30:01 30 2 1 68
14:40:00 10 1 1 88
14:50:00 4 1 3 93

Average 12 1 1 86

14:00:00 device %busy avque r+w/s blks/s avwait avserv
14:10:00 c1t2d0 0.47 0.74 2 19 4.37 6.25
c1t0d0 0.28 0.71 1 17 3.46 7.96
c2t0d0 0.04 0.50 0 2 4.62 3.04
14:20:00 c1t2d0 0.49 1.48 2 27 5.84 7.64
c1t0d0 0.39 1.03 1 26 4.93 8.16
c2t0d0 0.00 0.50 0 0 3.07 3.68
14:30:01 c1t2d0 0.78 0.77 3 35 4.27 6.82
c1t0d0 1.08 1.61 4 81 6.77 8.06
c2t0d0 0.02 0.50 0 1 3.37 3.79
14:40:00 c1t2d0 1.02 14.64 4 78 40.24 9.42
c1t0d0 0.62 1.05 2 42 5.34 8.14
c2t0d0 0.02 0.50 0 0 3.84 3.72
14:50:00 c1t2d0 1.43 0.58 5 59 4.72 4.65
c1t0d0 2.20 2.21 13 659 9.25 13.02

Average c1t2d0 0.84 4.37 3 44 13.88 6.87
Average c1t0d0 0.91 1.83 4 165 7.83 11.06
Average c2t0d0 0.02 0.50 0 1 3.88 3.50

(sorry if it doesn't format too well). It would seem that my root disk (c1t2d0) was busiest at 14:40, which is the 10 minute period I had the issue.

Bill: I'll try to get glance plus on it, but there's actually a small number (6) of very busy weblogic processes running.

Laurent: vhand is not showing on top at all so I don't think it's under pressure for memory - the box has 2GB, and there are 8 instances running, each with 256MB _max_ heap size. vhand has taken 5:49 cpu time since startup (27 days)