ProLiant Servers (ML,DL,SL)
1751969 Members
4807 Online
108783 Solutions
New Discussion юеВ

Re: IPMI reboots DL360

 
rafaelito
Advisor

Re: IPMI reboots DL360

@ Pac: Thanks, I was aware of that advisory and there is another thread about it. It seems it's not RHEL 5 specific.
I will check the Server BIOS & E200i Controller firware, since the iLO 2 is up to date.
rafaelito
Advisor

Re: IPMI reboots DL360

After upgrading the Bios & E200i:

May 9 12:12:22 farallon kernel: ipmi_si(SI_CHECK_BMC): Failed to get Global Enables 0xc6.
May 9 12:12:32 farallon hpasmxld[4901]: OsKcsExecCmd: IPMI NetFN 0x6 CMD: 0x25 has timed out!
May 9 12:12:42 farallon hpasmxld[4901]: OsKcsExecCmd: IPMI NetFN 0x6 CMD: 0x25 has timed out!
May 9 12:12:52 farallon hpasmxld[4901]: OsKcsExecCmd: IPMI NetFN 0x6 CMD: 0x25 has timed out!
May 9 12:13:02 farallon hpasmxld[4901]: OsKcsExecCmd: IPMI NetFN 0x6 CMD: 0x25 has timed out!
May 9 12:13:02 farallon hpasmxld[4901]: iLO 2 Communications Error - Attempting synchronization!
May 9 12:13:47 farallon hpasmxld[4901]: iLO 2 has responded to reset request . . .
May 9 12:13:47 farallon hpasmxld[4901]: Stopping the Watchdog Timer . . .
May 9 12:13:47 farallon hpasmxld[4901]: Resetting Internal Data structures . . .
May 9 12:13:47 farallon hpasmxld[4901]: Initializing Internal Data structures from iLO 2. . .
May 9 12:13:47 farallon hpasmxld[4901]: The iLO 2 reset / synchronization has completed successfully

No reboot by ASR. It happened 2 times since upgrade.

Blazhev_1
Honored Contributor

Re: IPMI reboots DL360

I see you have the latest Value Add software for Debian, seems like problem is solved only for RHEL.

Did you remove the OpenIPMI driver?

This seems the only way to get rid of the error under Debian. As far as I know the 8.0 for RHEL solved the problem, but there is no such for Debian.

rafaelito
Advisor

Re: IPMI reboots DL360

No. I didn't remove it.
Joshua Pott
Occasional Advisor

Re: IPMI reboots DL360

So my question is, has this issue magically resolved it self also for anyone else? It started early April, my server was rebooting ever week. Now its been reboot free just over two weeks.
rafaelito
Advisor

Re: IPMI reboots DL360

The rebooting for me stopped when I upgraded firmwares.
Now I get 3 to 4 times a day the following error:
May 25 15:14:46 farallon kernel: ipmi_si(SI_CHECK_BMC): Failed to get Global Enables 0xc6.
Fabio_S
Advisor

Re: IPMI reboots DL360

After having understood the MS patch was not for my case (it fixed PAE-enabled systems, which my x64 one is of course not), and having updated all drivers and firmwares, I keep having reboots, sometimes once in a couple of days, then nothing for a week, then again...
We will probably replace that server (for other reasons); if not, I'll open a ticket with HP and let you know!

rafaelito
Advisor

Re: IPMI reboots DL360

The problems continues. After some weeks without reboots the it occurred once again.
sSaraga
New Member

Re: IPMI reboots DL360

Hi, we have had a similair issue with a DL380 G5.

We disabled the ASR to get the bluescreen. It stated it was "***Hardware Malfunction*** Call your Vendor System Halted"

HP (standard) requests were to update the PSP from 7.9 --> 8.0. Also it was requested we update all the firmware using the firmware CD version 8.10.

I requested that due to the inabilty to force this failure we didn't want to wait for the next failure and just replace the P400 controller. The response was that they felt this was a memory issue. So they wanted to replace the memory instead of the P400 controller. We agreed to update the PSP, the firmware and to replacing the memory.

The SmartStart CD (offline) diagnostics did not report any issues with the hardware/memory. The ADU (online and online) did not report any issues.

I will post back today or tomorrow with an update. This server have been crashing daily for us.
sSaraga
New Member

Re: IPMI reboots DL360

After replacing the memory, updated the PSP and firmware the system still crashed again the following night.

The remaining hardware that could the issue has been ordered and we will replace them in a order that hopefully allows us to pin point the issue.