1832094 Members
3007 Online
110038 Solutions
New Discussion

DL360 weekely crash

 

DL360 weekely crash

I have a DL360 running Redhat 4 with a MSA20 Raid array atteched to it. The system started crashing every Saturday morning about two months ago.

I've installed the latest PSP pack (although it didn't rebuild the kernel like older version had). and the array has the latest firmware.

I though I found the problem last week. It was plugged into a portable UPS which we've had problems with in the past (not this particular unit, but others like it). I plugged it into the lab UPS and the system crashed about 5 hours later than it usually does.

Doea anyone have any idea what may be causing this? The system has to be power cycled to bring it back on-line, which causes everyone a lot of grief.

Any help is appreciated.

Phil
3 REPLIES 3
Steven E. Protter
Exalted Contributor

Re: DL360 weekely crash

Shalom Phil,

Take a look at /var/log/messages

Not doing a new kernel could be the problem. When a box crashes its often the kernel that does the crashing.

Could also be a hadware flaw, but that is not likely to go on a schedule.

A poorly written application or bad OS patch could cause this problem as well. Wide field here, lots of possible causes.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Re: DL360 weekely crash

Hi,

Thanks for the response.

I've combed through all the logs and all that's shown is the last message prior to the crash (usually a NFS mount request) and them the boot up sequence).

When installing the psp package, should the kernel get rebuilt? I know after installing previous patches, we usually had initrd's that were named HP-xxxx. Now all I have it the generic files. What needs to be done to get the HP versions? When we had the system booted up on them last year it was stable.

Thanks again,

Phil
dirk dierickx
Honored Contributor

Re: DL360 weekely crash

are you using any non-standaard-kernel-modules? if you're loading extra stuff in the kernel try to leave it out, see if it helps.

you should get a kernel dump of the crash, configure this so you'll be able to do better troubleshooting.

also make sure you're running the latest kernel version of your release.