1751790 Members
4668 Online
108781 Solutions
New Discussion юеВ

Re: load average

 
SOLVED
Go to solution
Fred Ruffet
Honored Contributor

load average

I can't help finding a clear definition of load average on a multi-CPU server.

I have always considered having a load average of 2 on a dual CPU server is only a full load, but per CPU lines on a top output is confusing.

Here is a sample of a top output :
Load averages: 2.30, 2.45, 2.60
1425 processes: 1374 sleeping, 49 running, 2 zombies
Cpu states:
CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS
0 2.17 90.8% 0.0% 8.6% 0.6% 0.0% 0.0% 0.0% 0.0%
1 2.30 88.0% 0.0% 8.4% 3.5% 0.0% 0.0% 0.0% 0.0%
2 2.23 84.3% 0.0% 6.7% 9.0% 0.0% 0.0% 0.0% 0.0%
3 2.36 72.9% 0.0% 18.1% 9.0% 0.0% 0.0% 0.0% 0.0%
4 2.31 55.4% 0.0% 27.3% 17.3% 0.0% 0.0% 0.0% 0.0%
5 2.24 80.2% 0.0% 12.4% 7.5% 0.0% 0.0% 0.0% 0.0%
6 2.50 65.2% 0.0% 14.9% 19.8% 0.0% 0.0% 0.0% 0.0%
7 2.24 73.1% 0.0% 8.4% 18.5% 0.0% 0.0% 0.0% 0.0%
--- ---- ----- ----- ----- ----- ----- ----- ----- -----
avg 2.30 76.2% 0.0% 13.2% 10.6% 0.0% 0.0% 0.0% 0.0%

I used to think that load average of 2,30 on this server means I have 2,30 processes running or ready-to-run on the server (ie using CPU or waiting I/O or waiting for CPU scheduling). On an 8 CPU server, it does not seem to be excessive. But a load is also displayed per CPU and is also near 2.

So, is run queue displayed as "Load averages" in top or in uptime displayed global or divided by CPU count ? does this load indicates waiting processes or an underloaded system ?

Best regards,
Fred
--

"Reality is just a point of view." (P. K. D.)
8 REPLIES 8
Dennis Handly
Acclaimed Contributor

Re: load average

>I can't help finding a clear definition of load average on a multi-CPU server.

This is a pretty poor statistic for tracking CPU usage. A percentage is more useful.

>is run queue displayed as "Load averages" in top or in uptime displayed global or divided by CPU count?

It is divided by CPU count, same as percentage.

>does this load indicates waiting processes or an underloaded system?

It indicates nothing because you need a tool like glance to tell you everything about the system, which includes CPU, disk, memory and networking.
Fred Ruffet
Honored Contributor

Re: load average

Dennis,

I do not plan to use load average as the only server state metric. It's just one metric over many others. I have many servers to watch over. Glance will help finding problems and bottleneck. What I need are metrics to put in a supervision tool to know which server needs particular care.

A percentage is not a better metric by itself. Look at CPU usage given by sar output. 100% means nothing. It could as well mean your server is fuly charged of overloaded.

As I understand your answer, load average shown in uptime or top output is divided by cpu count. A load of 2 on a 8 CPU server means 16 processes in run queue.

Regards,

Fred
--

"Reality is just a point of view." (P. K. D.)
Steven E. Protter
Exalted Contributor

Re: load average

Shalom Fred,

Your understanding seems reasonable.

Bill Hassell includes a discussion of this topic when he gives classes.

He says that its perfectly possible for a system to have a load average, much higher than normal without adverse impact. It depends on the type of processes running.

Our management server here is running between 4 and 5. There are no user complaints, slow downs or ITO warnings. We don't measure load average in our ITO alarm defs.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Fred Ruffet
Honored Contributor

Re: load average

Hi SEP,

It seems to be confuse in many minds. I searched a lot about this on the web and many sources (from ITRC to wikipedia, see how large was my search) made me think load average was not divided by CPU count.

What surprises me is that even Bill says load average isn't divided : http://forums13.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1258377275072+28353475&threadId=1126219
a guy complaining about a load of 66 is answered that it's only a full load if observed on a 64 CPU.

Regards,

Fred
--

"Reality is just a point of view." (P. K. D.)
Emil Velez
Honored Contributor

Re: load average

load average is per cpu average. This means that for each processor 2.3 processes are on the run queue.

1 is running 1.3 are waiting. Your CPU is always busy.. no or not much idle percentage. 230% utilization of each 100% cpu.

This means on a 8 processor system 8 processes are running 11 processes are waiting. Load average 2.3
Bill Hassell
Honored Contributor
Solution

Re: load average

The "load' value reported by uptime and top is simply the average size of the runqueue. The runqueue counts every process that is currently running plus every process that is ready to run (nothing blocking the process - note that I/O is a block) but cannot run because all the processors are busy. You can easily test this with a simple 3-line script:

while :
do :
done &

The above will run in the background and consume 100% of one CPU. top will report 100% for one CPU and the load average will eventually go to 0.50 on a 2-CPU system (assuming no other significantly CPU-bound processes are running). The key is that this is a load average for the system as a whole...1.00 means that all CPUs were busy during the measurement period. If you run the above script 3 more times on a 2-CPU system, the load will be 2.00 meaning that 2 process were running and 2 were waiting to run, on average.

Now HP-UX treats CPU-bound processes as less important and starts to reduce the priority (increases the C number in ps). So this poor 2-CPU system is apparently overloaded yet logins and disk I/O and vi and other user processes that use disk seem to run without any delays. I/O is treated as high importance because it takes a while to complete so the scheduler will quickly restart a process that has completed an I/O operation.

So the load factor is indeed divided by the number of CPUs. A load factor of 66 for a 64 CPU system means that more than 4000 processes were ready to run on average. Although this might seem to be excessive, the metric cannot distinguish unique processes. So a very fast process servicing specialized activities might be counted in the runqueue multiple times as each copy runs for a few milliseconds. This is the ambiguity in trying to measure very rapidly changing activities in a multiprocessor, multitasking OS.


Bill Hassell, sysadmin
Michael Steele_2
Honored Contributor

Re: load average

Hi

The classic definition of load average is how many process inthe run queue, and the run queue is block if > one processes is displayed.

I have never relied on this.

CPU bottleneck, High CPU percentage, High number of processes is more reliable.

No one uses uptime in the ATT environment, its more for a Berkeley environment and traditional, but reliable? I don't rely on it.
Support Fatherhood - Stop Family Law
Dennis Handly
Acclaimed Contributor

Re: load average

>A percentage is not a better metric by itself.

Just about anything is better than the load average. Especially if you have percentage in various states.

>A load of 2 on a 8 CPU server means 16 processes in run queue.

Yes.