Operating System - Linux
1748156 Members
3662 Online
108758 Solutions
New Discussion юеВ

Load rises to 50 on Redhat 7.2

 
Oliver Charni
Trusted Contributor

Load rises to 50 on Redhat 7.2

Hi all!

We recently got 5 new Proliant DL380 running RedHat 7.2 for a Project. All Machines were cloned from one Installation. On of them were having some Problems with the Load. The Load suddenly, withyout anybody working on the Machine rises from 0.1 to 50. Only thing we can do then is power cycle the thing. In some cases tough it seems the Load begins to rise when the Omniback(4.1) Backup starts. I tried to kill the VBDA's when the Load gets to 3+ but it doesnt to any good. Does anyone have a Suggestion where the Problem could be located ? RedHat told us to turn of the APM but this didnt help.
if it smell's funny on the outside, it's worse on the inside
7 REPLIES 7
Steven E. Protter
Exalted Contributor

Re: Load rises to 50 on Redhat 7.2

This happened to me when a user on an HP-UX machine wrote a script that called itself. He managed to get the load up to 282 before I got it under control.

I'm attaching a script that might help you kill bad processes en mass.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Oliver Charni
Trusted Contributor

Re: Load rises to 50 on Redhat 7.2

Im certain that this is not caused by any user Process because on various Occasions where the Thing crashed there was nobody working on the Machine. Also we had a Monitor running which did a ps -ef every 5 minutes and no user process showed up there. I attach a top from the Machine when the Load was at about 35 - 40. The Problem also is that i cannot kill any processes when the load passes 3 because the system just doesnt execute the Commands anymore.
if it smell's funny on the outside, it's worse on the inside
Cristian Draghici_1
Frequent Advisor

Re: Load rises to 50 on Redhat 7.2

If the load goes up when the machine is doing IO intensive operation and it has a lot of memory (2GB+) look for a different kernel.

Versions prior to 2.4.18 as relased by RedHat with 7.2 have issues with IO caching going out of control and taking down the machine(the VM gets confused between caching and freeing memory until 80% of kernel time is spent in the VM calls alone).

Either look for the RiekVanRiel patch and apply it over the vanilla kernel sources (or RedHat's but I'm not sure if it appies nicely over the ac patch) or look for a 2.4.18(+) RedHat kernel.

Hope this helps,
Cristi
Cristian Draghici_1
Frequent Advisor

Re: Load rises to 50 on Redhat 7.2

If you really need to monitor and kill the machine use a deadman. That would be a kernel module that counts from X down and when it reaches 0 it reboots the machine.

A userspace process keeps turning the deadman's counter back to X. If the userspace process can't set it back it means the machine load is too big for a simple operation to execute and the deadman will reboot it.

Cheers,
Cristi
Oliver Charni
Trusted Contributor

Re: Load rises to 50 on Redhat 7.2

The Machine only has 1 GB of Memory and were currently running the 2.4.20.19.7 SMP Kernel. We also tried some different Kernels(2.4.7.10, 2.4.20.18.7)
if it smell's funny on the outside, it's worse on the inside
david_69
Advisor

Re: Load rises to 50 on Redhat 7.2

Read your notes,
Some things I noticed:

1. Swap is 0% used. That seems odd to me, Althought linux does not agressivly use swap, zero usage is abnormal IMHO

2. rpc is running lots.
do you use rpc? What do you use it for. One strategy for good system administration is: "Don't install stuff you don't need". If you don't need deamons remove them. One strategy for Troubleshooting is "Make the system simpler and simpler by removing components until either the problem goes away or you are left with a single faulty component."

Hmm that was a bit of a mouthful.
Just because none is logged in doesn't mean it's not a process. What services does the machine provide? Is it a web server? NFS server? NFS client? DB server? Mail server?

What does netstat look like?

A hunch is that some nfs proc is locked in a run state.

Give us more ... we're getting there.

On a side note I have pegged a RH 7.3 server to load of 290 and been able to recover. I have to say that compared to RH 6.2 , RH 7.3 is a grand improvement. I am anxious to break stuff with 7.3 then fix them with 9.0.
fall down seven time stand up eight
dirk dierickx
Honored Contributor

Re: Load rises to 50 on Redhat 7.2

those rpcd's running on the machine might be needed by the ITO (OVO/VPO or whatever it is called now) agents. I guess he is also using ITO as he is already using OmniBack from HP too.

anyway, hp released new versions of the linux ito agents a few weeks ago, one of the entries in the patch documentation was that 'the control agent slowly consumes all available memory'. perhaps updating your agent could be the solution to your problem?