ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

DL385 G5: Unexplained Reboots

 
SOLVED
Go to solution
SeanQ
Occasional Visitor

DL385 G5: Unexplained Reboots

Hi All,

I'm dealing with an unusual case here and I'm not sure how to proceed. I have a DL385 G5 running Solaris 10 10/08. We have dozens of the exact same setup, without any problems. This particular machine however reboots constantly, without any rhyme or reason—sometimes once a month, sometimes four times a day. The OS has been patched, the firmware has been updated. There are no logs detailing what may have happened (actually it looks like the OS didn’t even know it was happening, which makes me think it’s hardware-related).

I’ve played this game before where HP support blames Sun and Sun blames HP, so I thought I’d ask you folks before subjecting myself to that again. I’m not sure what to do now, so any advice would be great. Thanks!
9 REPLIES
Saberus
Valued Contributor
Solution

Re: DL385 G5: Unexplained Reboots

Sean,

Just a stab in the dark, but have you checked the iLO logs as well? If it's hardware related, the iLO will state why the reboot was triggered. You should be able to save (or at least copy-paste) the iLO log entries so we can better help you diagnose the server.
SeanQ
Occasional Visitor

Re: DL385 G5: Unexplained Reboots

Hi Saberus,

For whatever reason we've never used iLO on these machines. I suppose I can set it up on this particular machine to see if the logs capture anything.

Any other ideas?
SeanQ
Occasional Visitor

Re: DL385 G5: Unexplained Reboots

I setup iLO on the server...I didn't realize it had been logging this whole time. Regardless, not much info I'm afraid:

Informational iLO 2 06/18/2010 17:24 06/18/2010 17:24 1 Server power restored.

Caution iLO 2 06/18/2010 17:24 06/18/2010 17:24 1 Server reset.

Informational iLO 2 06/18/2010 02:22 06/18/2010 02:22 1 Server power restored.

Caution iLO 2 06/18/2010 02:22 06/18/2010 02:22 1 Server reset.

Informational iLO 2 06/17/2010 15:19 06/17/2010 15:19 1 Server power restored.

Caution iLO 2 06/17/2010 15:19 06/17/2010 15:19 1 Server reset.

Informational iLO 2 06/16/2010 12:53 06/16/2010 12:53 1 Server power restored.

Caution iLO 2 06/16/2010 12:53 06/16/2010 12:53 1 Server reset.

Informational iLO 2 06/11/2010 23:41 06/11/2010 23:41 1 Server power restored.

Caution iLO 2 06/11/2010 23:41 06/11/2010 23:41 1 Server reset.

Informational iLO 2 06/09/2010 08:17 06/09/2010 08:17 1 Server power restored.

Caution iLO 2 06/09/2010 08:17 06/09/2010 08:17 1 Server reset.

Informational iLO 2 06/03/2010 22:17 06/03/2010 22:17 1 Server power restored.

Caution iLO 2 06/03/2010 22:17 06/03/2010 22:17 1 Server reset.

Informational iLO 2 05/31/2010 05:20 05/31/2010 05:20 1 Server power restored.

Caution iLO 2 05/31/2010 05:20 05/31/2010 05:20 1 Server reset.

Informational iLO 2 05/29/2010 11:50 05/29/2010 11:50 1 Server power restored.

Caution iLO 2 05/29/2010 11:50 05/29/2010 11:50 1 Server reset.

Informational iLO 2 05/27/2010 16:47 05/27/2010 16:47 1 Server power restored.

Caution iLO 2 05/27/2010 16:47 05/27/2010 16:47 1 Server reset.

Informational iLO 2 05/24/2010 12:26 05/24/2010 12:26 1 Server power restored.

Caution iLO 2 05/24/2010 12:26 05/24/2010 12:26 1 Server reset.

Informational iLO 2 05/22/2010 12:23 05/22/2010 12:23 1 Server power restored.

Caution iLO 2 05/22/2010 12:23 05/22/2010 12:23 1 Server reset.

Informational iLO 2 05/21/2010 10:20 05/21/2010 10:20 1 Server power restored.

Caution iLO 2 05/21/2010 10:20 05/21/2010 10:20 1 Server reset.

Informational iLO 2 05/13/2010 15:51 05/13/2010 15:51 1 Server power restored.

Caution iLO 2 05/13/2010 15:51 05/13/2010 15:51 1 Server reset.

Informational iLO 2 05/12/2010 13:48 05/12/2010 13:48 1 Server power restored.

Caution iLO 2 05/12/2010 13:48 05/12/2010 13:48 1 Server reset.

Informational iLO 2 05/09/2010 07:58 05/09/2010 07:58 1 Server power restored.

Caution iLO 2 05/09/2010 07:58 05/09/2010 07:58 1 Server reset.
Saberus
Valued Contributor

Re: DL385 G5: Unexplained Reboots

With no other alerts, I think you should be looking at your power supplies, or the power to the cabinet.

Also could be a heavy load on the server triggering ASR time-outs.
Jan Soska
Honored Contributor

Re: DL385 G5: Unexplained Reboots

Hello,
1) do you have something in IML log?
2) latest PSP installed?

Jan
SeanQ
Occasional Visitor

Re: DL385 G5: Unexplained Reboots

Hello,

Well I noticed some POST notifications regarding power supply 2, so I removed this unit in the hope that was my problem. Since I can't replicate the restarts, I had to wait to see if it would reboot again. Unfortunately, the server did reboot today. Nothing new in the iLO logs, and nothing at the OS level (core, crash dump, logs, etc). So I think I'm back to where I started.

IML logs do not reveal anything helpful. Not sure what you mean by PSP.

Thanks for the help.

Re: DL385 G5: Unexplained Reboots

Make sure you're at the ILO 1.82 firmware. The older version will cause random reboots. I don't think the FW utility will load this version by default.
Ken Krubsack
Trusted Contributor

Re: DL385 G5: Unexplained Reboots

Sean,

The ILO reboot issues were as much Windows drivers as the firmware itself - though v1.82 appears to be pretty stable.

I'd be looking at memory - I've had a number of instances where a memory DIMM started going bad and caused unexplained reboots until it finally got bad enough it triggered SIM.
SeanQ
Occasional Visitor

Re: DL385 G5: Unexplained Reboots

Thank you everyone for your advice.

I opened a ticket with HP and they had me do the following:

1. Insert the SmartStart CD into the DVD/CD-ROM drive.
2. Shut down the operating system, and turn off the server.
3. Turn on the server. The system boots from the SmartStart CD.
4. From the Maintenance Utilities menu, select Server Diagnostics.
5. Click the Test tab.
6. Select the type of test to perform as Complete.
7. Select the mode of testing to perform: Unattended or Interactive.
8. Set a minimum of at least 15 loops to test the server thoroughly
9. Click Begin Testing to start the test

I had done this test before and did not find any issues. However, Support had me run the test through 15 loops and apparently that did the trick. The test found that there is a memory dimm failure. I replaced the memory, ran the test again, and the problem appears to be resolved.