System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

top command shows huge load average

Sundar G
Frequent Advisor

top command shows huge load average

hi gurus,

one of my hp ux 11.11 server , running on L class box, shows load avergae of more than 25. I am fearing about system crash in future. Sar average shows 55% CPU usage only. Any ideas to dig in futher and check?

BTW, i dont have any serious errors in syslog also.

Regards

Sundar
5 REPLIES
Dennis Handly
Acclaimed Contributor

Re: top command shows huge load average

The load average is a confusing metric. I would trust in sar's 50%, or glance's details.
But you do need to have some idea of I/O, networking and response time.
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1297575

>I am fearing about system crash in future.

I doubt there is a relation between the load average and crashing. Recently there was a thread about having a load average of 7500!
http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1315579
Tingli
Esteemed Contributor

Re: top command shows huge load average

What about # uptime
Emil Velez
Honored Contributor

Re: top command shows huge load average

Load average is the number of threads waiting to execute on aveage over the interval. Wondering if this is short term spikes or long term averages.
Olivier Masse
Honored Contributor

Re: top command shows huge load average

With 11.11 there is a "feature" I noticed where the load average can get very high, especially with Java or heavily multithreaded programs. You won't see this on 11.23 and up. I wouldn't bother with it unless you notice a performance decrease.

I ran 11.11 a few years ago, and I managed to make the load average go over 110 using a modified C program that came with the HP-UX performance course. The system still held up strong. 110! That's what proved me that it was a problem with the way the load average was calculated and I stopped looking at that metric until 11.23.

Good luck
Bill Hassell
Honored Contributor

Re: top command shows huge load average

"load average" is a very misleading terminology. What it measures is the size of runqueue over a period of time. The runqueue is where all the programs that are actually running and all the programs that are ready to run but there are no more CPUs available. In the simplest form, a single processor running one totally compute-intensive program (no I/O) will have a CPU load of 1.00 (see uptime or top). Here is the simplest compute intensive shell program:

while :
do
:
done

This works on Bourne and POSIX shells such as ksh and bash as well as HP's POSIX shell /usr/bin/sh. It loops around as fast as it can until you interrupt it with CTRL-C.

Now you can increase the load on this one CPU system by running additional copies of the program, perhaps 20 or even 200 copies at the same time. On a single processor system, the compute-intensive processes will be time shared, each getting an equal number of CPU time. top will show 4 copies (on the 1 CPU system) each using 25% of the CPU. But the load factor will be about 4.00 (after a minute or so) which means 1 process is running and 3 are ready to run. To run 4 copies of the script above, use this 1-liner 4 times:

while :;do :;done&

Now on a 4 processor system, these 4 scripts will each consume 100% CPU time and all 4 will run without stopping (no timesharing needed), but the load factor will still be 4 (4 processes running at the same time). So for the same amount of instructions between the 4 copies, the 1-CPU system will take 4 times longer to complete.

But that's the easy case: 100% CPU running for a long time. Now take a process that which has 1000 threads that run for only a few milliseconds and they are constantly starting and completing hundreds of times each second. Now the load can easily jump to easily jump to 100 or even thousands. Are there 1000 processes ready to run? It depends on time period being measured. Over a few seconds, yes. During a one millisecond period, radically varied, sometimes 50 in the queue, sometimes 100, etc. But the kernel can't spend all of its time measuring so the runqueue size is just sampled and then averaged by uptime, top and other programs.

It's important to note that most kernels will degrade the priority of long term compute-intensive processes so that I/O such as terminal keyboards, networking and disk/tape activities will run immediately. So high CPU loads are not necessarily a bad thing. Note that it is quite normal for the kernel overhead to take a much larger percentage (30-60%) when hundreds of short-lived threads or daemon processes are running.


Bill Hassell, sysadmin