System Administration

System load is very high (Load averages: 15.28, 15.27, 15.27) and more sleeping process (217)

 
SOLVED
Go to solution
senthil_kumar_1
Super Advisor

System load is very high (Load averages: 15.28, 15.27, 15.27) and more sleeping process (217)

Hi All,

There is one HP-UX server (10.20) running on K-580 series hardware in our environment.

For a past few days I am seeing that the system load is more and more sleeping process (217) but CPU is normal.

Example:

# top

Load averages: 15.28, 15.27, 15.27
218 processes: 217 sleeping, 1 running
Cpu states:
CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS
0 3.15 0.2% 0.0% 0.4% 99.4% 0.0% 0.0% 0.0% 0.0%
1 15.19 0.0% 0.0% 2.0% 98.0% 0.0% 0.0% 0.0% 0.0%
2 20.00 0.2% 0.0% 0.2% 99.6% 0.0% 0.0% 0.0% 0.0%
3 22.76 0.0% 0.0% 7.5% 92.5% 0.0% 0.0% 0.0% 0.0%
--- ---- ----- ----- ----- ----- ----- ----- ----- -----
avg 15.28 0.0% 0.0% 2.6% 97.4% 0.0% 0.0% 0.0% 0.0%


And there is a following issue with sendmail in that same server.

ps -ef | grep -i sendmail
root 10308 1 0 Apr 28 ? 0:22 sendmail: rejecting connections on port 25: load average: 15
root 15884 1 0 Dec 13 ? 0:00 sendmail: BAA15880: from queue
root 3816 24919 1 23:00:22 pts/54 0:00 grep -i sendmail



How to resolve the issue.
42 REPLIES 42
R.O.
Esteemed Contributor

Re: System load is very high (Load averages: 15.28, 15.27, 15.27) and more sleeping process (217)

Hi,

It will be useful if you provide the complete output of "top" (I mean the 1st screen), "vmstat 5 5" and "swapinfo -tam" to see what's going on.

Regards,
"When you look into an abyss, the abyss also looks into you"
Steven E. Protter
Exalted Contributor

Re: System load is very high (Load averages: 15.28, 15.27, 15.27) and more sleeping process (217)

Shalom,

load average is calculated by the average number of processes waiting for CPU.

This figure is kind of high.

Bill Hassell provides a good example in his presentations of a system with very high load average and no other issues.

So lets go through a process to see if anything needs to be done at all.

1) Is there a response complaint? Do users report slow access or access difficulty.

If no, consider doing nothing. If yes, continue.

2) Does the system need to receive mail? 99% of all systems running sendmail daemon don't need to because they receive no inbound mail.

If no, consider doing nothing. If yes, identify the processes using up CPU time, identify their application and do further analysis.

Quite often bouncing a service resolves this problem.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Modris Bremze
Esteemed Contributor

Re: System load is very high (Load averages: 15.28, 15.27, 15.27) and more sleeping process (217)

As mentioned, more info from other system utilities would be useful. You could also take a look at
http://it.toolbox.com/wiki/index.php/Determining_the_Cause_of_System_Performance_Problems_for_HP-UX . It could provide something useful.
Prasanth V Aravind
Trusted Contributor

Re: System load is very high (Load averages: 15.28, 15.27, 15.27) and more sleeping process (217)


System load is 15.28 & cpu utilization free ....
Means processes are getting blocked somewhere else..
It can be memory,disk or network.
Find out where you have bottleneck ??

You can use these utilities for this sar,vmstat,iostat,galnce,gpm,ovpm


http://h71028.www7.hp.com/enterprise/w1/en/os/hpux11i-kod-v3-performance-troubleshooting.html

this is very good training performance troubleshooting.

Gudluck
Prasanth
Taifur
Respected Contributor

Re: System load is very high (Load averages: 15.28, 15.27, 15.27) and more sleeping process (217)

Hi ,

Check vmstat, top, swapinfo -tam , what is going on there and also check from sar output.

Cheers//
Taifur
senthil_kumar_1
Super Advisor

Re: System load is very high (Load averages: 15.28, 15.27, 15.27) and more sleeping process (217)

Hi All,

#vmstat 5 5
procs memory page faults cpu
r b w avm free re at pi po fr de sr in sy cs us sy id
0 61 0 5498 747277 78 96 0 0 0 0 0 0 211 15 2 10 88
0 61 0 5249 747247 149 201 0 0 0 0 0 736 1535 174 1 4 95
0 61 0 5549 747247 50 64 0 0 0 0 0 720 519 92 0 1 99
0 61 0 5327 747247 16 20 0 0 0 0 0 716 183 70 0 0 100
0 61 0 3554 747247 5 5 0 0 0 0 0 721 80 64 0 0 100




#sar

HP-UX lgprime B.10.20 A 9000/800 05/10/10

00:00:00 %usr %sys %wio %idle
00:10:01 0 2 0 98
00:20:00 0 1 0 98
00:30:00 0 2 0 98
00:40:01 0 2 0 98
00:50:00 0 2 0 98
01:00:00 0 1 0 98
01:10:00 0 2 0 98
01:20:00 0 2 0 98
01:30:00 0 2 0 98
01:40:00 0 1 0 98
01:50:00 0 1 0 98
02:00:00 0 2 0 98
02:10:00 0 2 0 98
02:20:00 0 2 0 98
02:30:00 0 1 0 98
02:40:00 0 2 0 98

Average 0 2 0 98



#sar -d

HP-UX lgprime B.10.20 A 9000/800 05/10/10

00:00:00 device %busy avque r+w/s blks/s avwait avserv
00:10:01 c2t6d0 1.04 1.60 1 12 10.01 16.93
c2t5d0 0.86 0.75 1 10 5.88 17.00
c5t0d0 0.01 0.50 0 0 1.42 17.48
c6t1d1 0.18 0.50 0 4 2.22 20.55
00:20:00 c2t6d0 1.12 0.55 1 15 3.51 19.99
c2t5d0 0.95 0.55 1 14 3.56 19.33
c5t0d0 0.03 0.50 0 0 1.00 15.47
c6t1d1 0.10 0.50 0 2 0.55 18.70
00:30:00 c2t6d0 1.12 0.51 1 14 3.41 13.60
c2t5d0 0.84 0.50 1 9 3.24 14.51
c5t0d0 0.11 0.50 0 0 3.20 20.83
c6t1d1 0.10 0.50 0 2 0.56 18.94
00:40:01 c2t6d0 0.85 0.52 1 9 3.07 14.85
c2t5d0 0.69 0.50 1 8 2.99 14.80
c5t0d0 0.03 0.50 0 0 3.38 18.54
c6t1d1 0.09 0.50 0 2 0.43 19.44
00:50:00 c2t6d0 1.25 0.67 1 17 4.03 19.95
c2t5d0 1.05 0.64 1 15 4.16 19.54
c5t0d0 0.03 0.50 0 0 0.53 18.93
c6t1d1 0.18 0.50 0 4 1.35 20.20
01:00:00 c2t6d0 0.77 0.52 1 8 2.85 14.55
c2t5d0 0.61 0.50 1 7 2.70 13.97
c5t0d0 0.02 0.50 0 0 1.73 16.26
c6t1d1 0.09 0.50 0 2 0.36 20.17
01:10:00 c2t6d0 1.42 1.12 2 16 7.11 14.47
c2t5d0 1.22 0.86 2 14 6.14 13.64
c5t0d0 0.06 0.50 0 0 2.79 17.06
c6t1d1 0.09 0.50 0 2 1.09 19.17
01:20:00 c2t6d0 2.45 0.57 3 22 4.63 12.41
c2t5d0 1.01 0.66 1 15 3.98 19.08
c5t0d0 0.01 0.50 0 0 0.96 16.96
c6t1d1 0.52 0.61 1 38 5.37 7.96
01:30:00 c2t6d0 0.85 0.56 1 9 3.05 15.02
c2t5d0 0.72 0.50 1 8 2.66 14.57
c5t0d0 0.03 0.50 0 0 1.33 19.54
c6t1d1 0.13 0.50 0 4 1.45 20.00
01:40:00 c2t6d0 0.73 0.56 1 8 3.03 14.82
c2t5d0 0.60 0.50 1 7 2.72 14.59
c5t0d0 0.01 0.50 0 0 1.35 15.06
c6t1d1 0.09 0.50 0 2 0.36 19.76
01:50:00 c2t6d0 1.29 0.93 2 17 5.85 18.91
c2t5d0 1.05 0.96 1 15 5.77 18.63
c5t0d0 0.02 0.50 0 0 0.97 16.66
c6t1d1 0.46 0.50 0 6 2.98 28.75
02:00:00 c2t6d0 1.11 0.51 1 15 3.14 14.00
c2t5d0 0.81 0.50 1 9 2.88 14.34
c5t0d0 0.10 0.50 0 0 2.97 21.38
c6t1d1 0.09 0.50 0 2 0.35 19.76
02:10:00 c2t6d0 1.07 1.10 1 12 7.87 18.31
c2t5d0 0.89 0.66 1 10 4.96 17.12
c5t0d0 0.03 0.50 0 0 1.69 18.45
c6t1d1 0.11 0.50 0 2 0.66 20.33
02:20:00 c2t6d0 1.19 0.58 1 16 3.49 20.38
c2t5d0 1.00 0.55 1 14 3.52 19.16
c5t0d0 0.01 0.50 0 0 1.87 17.74
c6t1d1 0.09 0.50 0 2 0.43 18.65
02:30:00 c2t6d0 0.78 0.50 1 8 2.61 15.53
c2t5d0 0.70 0.50 1 8 2.70 15.00
c5t0d0 0.02 0.50 0 0 0.69 17.93
c6t1d1 0.10 0.50 0 2 0.33 18.71
02:40:00 c2t6d0 0.80 0.52 1 8 2.87 14.53
c2t5d0 0.62 0.50 1 7 2.80 13.78
c5t0d0 0.02 0.50 0 0 1.53 18.35
c6t1d1 0.11 0.50 0 2 0.87 18.25

Average c2t6d0 1.11 0.72 1 13 4.65 15.94
Average c2t5d0 0.85 0.63 1 11 4.03 16.47
Average c5t0d0 0.04 0.50 0 0 2.17 18.78
Average c6t1d1 0.16 0.54 0 5 2.75 16.08




#top

System: lgprime Mon May 10 02:45:50 2010
Load averages: 15.27, 15.27, 15.27
221 processes: 220 sleeping, 1 running
Cpu states:
CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS
0 3.67 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 0.0% 0.0%
1 16.31 0.0% 0.0% 4.0% 96.0% 0.0% 0.0% 0.0% 0.0%
2 17.06 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 0.0% 0.0%
3 24.03 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 0.0% 0.0%
--- ---- ----- ----- ----- ----- ----- ----- ----- -----
avg 15.27 0.0% 0.0% 1.0% 99.0% 0.0% 0.0% 0.0% 0.0%

Memory: 39172K (23996K) real, 48156K (33168K) virtual, 2988932K free Page# 1/19

CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND
3 ? 615 root 154 20 180K 232K sleep 1542:07 0.51 0.51 syncer
1 ? 3 root 128 20 0K 0K sleep 405:47 0.14 0.13 statdaemon
1 ? 7 root -32 20 0K 0K sleep 395:11 0.13 0.13 ttisr
1 ? 941 root 154 20 396K 236K sleep 70:26 0.11 0.11 rpc.statd
0 ? 19 root 100 20 0K 0K sleep 286:04 0.07 0.07 netisr
0 ? 1166 root 154 20 8568K 1896K sleep 12:13 0.06 0.06 rpcd
0 pts/54 10981 root 178 20 984K 212K run 0:00 1.00 0.05 top



#swapinfo -tam
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 1536 0 1536 0% 0 - 1 /dev/vg00/lvol2
reserve - 74 -74
memory 2880 349 2531 12%
total 4416 423 3993 10% - 0 -

Steven E. Protter
Exalted Contributor

Re: System load is very high (Load averages: 15.28, 15.27, 15.27) and more sleeping process (217)

Shalom,

Lets see some vmstat output if that command is available on 10.20 HP-UX.

Average c2t6d0 1.11 0.72 1 13 4.65 15.94
Average c2t5d0 0.85 0.63 1 11 4.03 16.47
Average c5t0d0 0.04 0.50 0 0 2.17 18.78
Average c6t1d1 0.16 0.54 0 5 2.75 16.08

This shows the disks listed above are being worked a little hard.

Problem seems to be heavy writes. Combine that with a near full file system and you will get severe delays.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Raj D.
Honored Contributor

Re: System load is very high (Load averages: 15.28, 15.27, 15.27) and more sleeping process (217)

How is the load average today,
Please post:

# UNIX95=1 ps -e -o pcpu,pid,ppid,args | sort -rn | head -n 20
# UNIX95=1 ps -e -o vsz,pid,ppid,args | sort -rn | head -n 20
# uptime
# ps -ef | wc -l
# who -u | wc -l ; netstat -n | grep EST | wc -l
# glance # first page.
# top # first page.
# sar -u -M 2 3
# vmstat 3 5
# sar -d 3 10



Cheers,
Raj.

" If u think u can , If u think u cannot , - You are always Right . "
Raj D.
Honored Contributor

Re: System load is very high (Load averages: 15.28, 15.27, 15.27) and more sleeping process (217)

Senthil,
Well, from above data , disks are showing medium high busy , but that would not cause that much of load average that goes 15+,
.
>>>
2t5d0 0.86 0.75 1 10 5.88 17.00
c5t0d0 0.01 0.50 0 0 1.42 17.48

c2t5d0 0.95 0.55 1 14 3.56 19.33


c5t0d0 0.11 0.50 0 0 3.20 20.83
c6t1d1 0.10 0.50 0 2 0.56 18.94
>>>

- avwait and service time more than 100 can cause little impact, but this is medium high.
How is the uptime and performance today.

Hth,
Raj.
" If u think u can , If u think u cannot , - You are always Right . "