Operating System - HP-UX
1842690 Members
2693 Online
110208 Solutions
New Discussion

Re: "Load" increasing and machine locking up

 
SOLVED
Go to solution
Garry Ferguson
Frequent Advisor

"Load" increasing and machine locking up

Hi. We have an L2000 running HPUX 11.00 and some Oracle. It has locked up recently and needed reboots to clear. Perfmeter shows almost no activity but the "load" steadily increasing.
It will allow commands to be run but no processes to be killed. Eventually it locks up.
sar shows 1 disk has high av service times of 300ms. This is way up on normal. Does this behaviour ring any bells with anyone?
Many thanks, Garry
9 REPLIES 9
Mel Burslan
Honored Contributor

Re: "Load" increasing and machine locking up

of course this is far from being NORMAL. L2000 and oracle is pretty much a standard configuration.

first thing I would suggest is to check the patch levels of your machine. Looks like you are way behind.

if you have glance installed, after a reboot, start a terminal version of glance plus and watch where it starts to choke. At least it will give you an indication of the culprit for the failure. Watch the meter bars on top to see CPU, disk, memory and swap utilizations and see just before it totally locks up, what reaches a 100% or close to this utilization.

A starting point at least.
________________________________
UNIX because I majored in cryptology...
Tim D Fulford
Honored Contributor

Re: "Load" increasing and machine locking up

Hi

Yes you system has become clasically io bound... High load average, high disk utilisaton. The disk with 300ms service time is probably broken. Usually 5-10ms is OK for JBOD disk and 1-4 ms for disk with cache & controller, either way 300ms is 30 times or more too large.

Regards

Tim
-
Garry Ferguson
Frequent Advisor

Re: "Load" increasing and machine locking up

Mel and Tim; thanks very much for your replies. I don't have glance but I run a "top" and swap-space report every 5 minutes. I also take sar "snapshots" every 15 seconds. It does not appear to me to be
running out of resources. Nothing is unusual just before the lockups. Tim, when you say the disk is probably broken, do you mean it could be broken hardware-wise ??
Thanks,
Garry
melvyn burnard
Honored Contributor

Re: "Load" increasing and machine locking up

If the system is indeed hanging, do a TOC, get a dump created, and log a call wiht your local HP Response Centre to have it analysed.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Garry Ferguson
Frequent Advisor

Re: "Load" increasing and machine locking up

Melvyn, Excuse my ignorance, but when you say do a TOC, what do you mean ??
Thanks,
Garry
Steve Steel
Honored Contributor
Solution

Re: "Load" increasing and machine locking up

Hi

TOC Transfer of control

see

http://www.interex.org/pubcontent/enterprise/mar99/09qa/09qa.html

If all else fails, try stopping all processes that can be stopped and umounting all file systems that can be umounted and then TOC the machine. Doing a transfer of control (TOC) will only save the contents of memory to disk if your machine has been properly configured to do so (see savecore(1M)). An analysis of the dump can be performed to determine the cause of the process hangs.


http://www.docs.hp.com/cgi-bin/fsearch/framedisplay?top=/hpux/onlinedocs/5990-8170/5990-8170_top.html&con=/hpux/onlinedocs/5990-8170/00/00/68-con.html&toc=/hpux/onlinedocs/5990-8170/00/00/68-toc.html&searchterms=toc&queryid=20040713-031651


Steve Steel
If you want truly to understand something, try to change it. (Kurt Lewin)
Garry Ferguson
Frequent Advisor

Re: "Load" increasing and machine locking up

Thanks for all your replies.
Brilliant. I have learned much from them!
Hopefully I can finally track down our
problem when it next occurs!

Garry
Garry Ferguson
Frequent Advisor

Re: "Load" increasing and machine locking up

Hi. Just an update.
Found the machine was locking up because it was waiting on i/o on a disk that was not working properly. All processes went idle while the one waiting just sat there patiently! I identified the disk using sar.
They are all hot-swapable so we pulled the offending disk out and put it back and the machine recovered and carried on! I've now applied a firmware patch for the disks taking them from HP01 to HP04. Looks OK at the moment but if it occurs again we'll get a new disk.
Tim D Fulford
Honored Contributor

Re: "Load" increasing and machine locking up

Garry

I was on holiday, so did not reply to your question.. but it seesm the question was fully resolved due to circumstance.

A disk doing 300ms service times is VERY likely to be broken either becauser of poor or old formware, or most likely, an intermittent hardware failure, as you found out..

Tim
-