1748138 Members
3804 Online
108758 Solutions
New Discussion юеВ

Re: Load Average

 
SOLVED
Go to solution
Charles Keyser
Frequent Advisor

Load Average

recently we had a server where the load average was above 6.20. I am looking for a smile script to run in the cron that will alert all UNIX admins when the load average starts to go above 1.50 and then adjust the load average later if needed.
Processes started spawning one after another to where we were not able to kill them. We ended up bouncing the system. Any help will be greatly appreciated

CJ
6 REPLIES 6
Tim Nelson
Honored Contributor
Solution

Re: Load Average

If you have glance plus alerts are built in and can be configured to send an email.

otherwise something simple like this would work.

var1=`uptime|awk -F : '{print $4}'|awk -F, '{print $1}'`
var2=`echo "$var1 * 100"|bc`
if [[ $var2 -gt 150 ]]
then
echo alert
else
echo ok
fi


Prashantj
Valued Contributor

Re: Load Average

Hi Charles,

Please find the attached script contained the Load Avarage +
Top 5 process which taken high memory +
Top 5 process which taken high CPU usage.

Hope this will help U.

Thanks
Prashant
Good judgment comes from experience and experience comes from bad judgment.
Steven E. Protter
Exalted Contributor

Re: Load Average

Shalom CJ,

Note that load average itself is not an indication of trouble.

There are legitimate reasons for load average to shoot up, due to way applications are written.

This utility might provide some information on process memory use, which could provide data before the system goes crazy: http://www.hpux.ws/?p=8

Most likely cause of this problem is bad application code.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Bill Hassell
Honored Contributor

Re: Load Average

There is no way to generalize this task. You have a runaway set of processes so the first task is to identify them. The problem with a monitor script is that it can easily make mistakes when things are happening rapidly and start killing HP-UX processes, thus causing a crash.

There is no method to "adjust the load average" as this is nothing more than a measure of the number of processes that are in the run queue. Since it is an average over time, it is a very poor measurement of system load. A run queue average that is much higher than the number of CPUs can indicate a very normal operation of hundreds of short-lived processes. Of course, it can also occur with runaway processes.

So you have to identify the processes and then stop running them until the cause is located and fixed. Using the OS to monitor and try to kill some processes without causing more problems is asking for worse problems. Find the programmer and get the code repaired.


Bill Hassell, sysadmin
Charles Keyser
Frequent Advisor

Re: Load Average

Thanks everyone for your input ans assistance. It has been a great help

CJ
Charles Keyser
Frequent Advisor

Re: Load Average

Prashantj

I used your script and it is awesome, is it possible to be modified to check the number of processes running and to monitor spikes?
Any help would be great

Thanks
CJ