ProLiant Servers (ML,DL,SL)
1753980 Members
6343 Online
108811 Solutions
New Discussion юеВ

Re: Hardware Crashing

 
Ayman Altounji
Valued Contributor

Hardware Crashing

Here's an interesting issue:

One of our HA servers in a production web-site has become extremely unstable. Since I have taken over the administration of these machines this box has crashed three times. The crashing seems to be hardware related, since the OS does not log the issue in Event Viewer, and the currently installed Compaq hardware monitoring utilites have been of little help.

What happens is basically this: the machine will appear to "power down". It does not output video (the monitor light is amber when the server is selected in the KVM server list) and does not respond to any form of network signal, including pinging. However, the power supply fans & cooling fans are all running, and the lights in front of the machine are on and green. There are no smells of burning plastic coming from the machine. When the power is cycled the server boots up normally, and Win2K merely reports in the System Event Viewer that at a given time "the previous shut down was unexpected", the time corresponding to when the server crashed, not when the power is cycled.

Anyway, I am stumped. Obviously there is some sort of failure going on in a "single point of failure" sub-section, maybe a bad RAM module? Any help/advice is extremely appriciated!

Thanks!
5 REPLIES 5
Ayman Altounji
Valued Contributor

Re: Hardware Crashing

Remove ALL the Compaq Management Agents, and post back your results.
Ayman Altounji
Valued Contributor

Re: Hardware Crashing

I am experiencing a similar problem with my DL580 coming back online after a reboot. I have been trying to set this server up for about two weeks and have had no luck. The server is a DL580 with dual PIII 700 2MB procs, 4GB of RAM, two 36.4GB HDDs, a RIB-LO in slot 6, the dual port NIC in slot 5, and a FC-HBA in slot 4. What has been happening is the video fails to come back after a warm reboot while in Smart Start. I have checked all my wires, checked the Virtual Power Button, updated the ROM, and updated the ROM on the RIB-LO. I'm kinda of lost and was wondering if anybody had any ideas. Thanks in advance.

KEO
Ayman Altounji
Valued Contributor

Re: Hardware Crashing

1: The stupid NT crash dump hasn't used all the disk space up has it? Have a look at your c drive and see if you have any space left. Also, and this is a wild guess, install the hal recovery option as it may be failing a processor then trying to come up on the standard hal, which won't be there unless you install it. And run some diags for a day or two
2: RE rib board......I can't remember which pci bus is primary on a 580 but make sure the rib board is in the first slot, primary bus.
Ayman Altounji
Valued Contributor

Re: Hardware Crashing

I can't look at the C: drive because I can't get Smart Start to come back after a warm reboot. The server isn't even finished installing yet.

The RIB-LO is in the right place. I double checked that after reading your note. Thanks.

KEO
Ayman Altounji
Valued Contributor

Re: Hardware Crashing

The problem turned out to be a bad RIB-LO. Thanks for the help.

KEO