Server Management - Remote Server Management
1827677 Members
4139 Online
109967 Solutions
New Discussion

ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

 
Iain Binnie
Advisor

ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

Hi,

This issue involves several DL380 and DL360 G5 servers.

I recently updated the ILO2 firmware to version 1.79 and also the HP System Management Homepage to v3.0.2.77 on several DL380 and Dl360 Servers.

Since doing so I have had sporadic power up and down issues and when looking in the ILO2 log I receive this message each time before the power cycle "BMC IPMI Watchdog Timer Timeout: Action=System Power Reset"

I have dug around online and find various threads leading nowhere specifically all indicating that HP are working on this then ending with a disable ASR as a fix in-between, which I have done and it appears to have stopped the reboots.

My questions are 1. What is the issue I am experiencing and how do I fix it ?

2. What exactly are ASR and BMC IPMI Watchdog Timer? And what are they used for?

Your help as ever is greatly appreciated!

Best Regards
Iain
21 REPLIES 21
Michael Leu
Honored Contributor

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

I'm no expert in this but I think the recommendation (for Linux systems) is to disable ASR.

The DL585 G5 User Guide on ASR:
ASR is a feature that causes the system to restart when a catastrophic operating system error occurs, such as a blue screen, ABEND, or panic. A system fail-safe timer, the ASR timer, starts when the System Management driver, also known as the Health Driver, is loaded. When the operating system is functioning properly, the system periodically resets the timer. However, when the operating system fails, the timer expires and restarts the server.
Michael Leu
Honored Contributor

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

By the way, someone else has answered your second question much better then I ever could :-)

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1374922
acartes
Honored Contributor

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

The problem solution requires both:
- update to iLO 2 v1.78 or later
- update the Windows Management Controller Driver to 1.11.2.0 or later

Updating one or the other is not a complete fix.

Discussed in this Customer Advisory:
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?locale=en_US&objectID=c01802766
Support Microsoft Serv
Occasional Visitor

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

have yuo tried upgrading the iLO 2 Management Controller Driver Version?
Billy Barule
New Member

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

acartes, the Advisory you cite talks about this issue occurring when the iLO2 Firmware is at or below 1.70. Iain's issue (and mine coincidentally) did not manifest until the iLO Firmware was upgraded to 1.79.
I had a colleage reference the same Advisory article, but I'm not convinced the problem is solved.
Mark Ottaway
New Member

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

I'm having the same issue. Mine didn't start unitl upgrading to 1.79!! I have updated the Instance driver, controller driver and the system rom. Still gettign random reboots and reports of BMC IPMI Watchdog Timer Timeout.
John Cunningham_4
Occasional Contributor

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

I have the same issue here on a number of DL360G5s. Reboots a random time after upgrading to ILO 1.79

HAs anyone tried reflashing back to 1.78?

It is suggested elsewhere that the reboot will only happen once - can anyone confirm this?

John
Jimmy Tn
Occasional Advisor

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

Hi,

My experience is that iLO2 Management Driver 1.11.2.0 AND iLO2 firmware 1.78 or later will fix the issue.

BR / Jimmy
acartes
Honored Contributor

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

>> acartes, the Advisory you cite talks about this issue occurring when the iLO2 Firmware is at or below 1.70. Iain's issue (and mine coincidentally) did not manifest until the iLO Firmware was upgraded to 1.79.
I had a colleage reference the same Advisory article, but I'm not convinced the problem is solved.

A complete solution requires that both iLO and the OS driver are updated.
Details for each are in the CA.
The iLO change addresses a "duplicate records in the SEL" bug (generated by the OS driver), the OS driver change prevents a communication hang-up that can result in an ASR.

The iLO change was introduced in iLO 2 v1.78, and released at the same time as the OS driver update. It is a difficult "update two pieces of software to address the issue" fix.

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

I had this problem on a first generation DL320s.

I have sucessfully reflashed the iLo firmware back to 1.78 from 1.79.

You can do this by using HP Software Update Manager.
1. Download the latest HP firmware Maintenance CD ISO
2. Edit the ISO and add iLo Firmware 1.78 .scexe file if not already present.
3. write the image to a USB stick for faster booting and your away!
You may have to use force downgrade and then select verion 1.78.

Also use the latest iLo management driver.

I have had loads of grief on DL370's but this went away with 1.78 and the Windows iLo driver
Matt Sebel
Advisor

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

We just had the same problem last night after upgrading iLO to 1.79 in a couple of BL460 blades running 2003 SP2. The both rebooted early this morning with the same BMC message. Upgraded the driver to 1.12 (latest version available). No we wait to see if the problem manifests itself again.

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

Same issue with our 460c G1 blades. ILO Firmware is 1.79 and the HP ProLiant iLO 2 Management Controller Driver is at 1.12. Still a problem
Dennis D
New Member

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

DL360 G5
iLO 179 - BMC IPMI Watchdog Timer Timeout: Action=System Power Reset.

had to revert to 178 to stabilize.
Dennis D
New Member

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

forgot to mention
i am running Server 2003 R2 with SQL2005 SP3.
Klause
New Member

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

We are having the same issue on 4 DL360 G5
It started with NO UPDATES. Just started at the start of the month. I've updated EVERYTHING as per HP (using their software) and it's still happening. Disabled ASR, still happening but now I don't get the watchdog error. We are on 1.79 now, but HP suggested I drop back to 78. It's not cool to have production servers reboot.
Klause
New Member

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

Does anyone know if you have to reboot after the ILO firmware upgrade\downgrade?
Mike Murphy_5
New Member

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

Same issue on a ML370G5; however, didn't notice it until now. The issue started the next day AFTER updating iLO2 to FW 1.79 (logs support this). I have since updated to FW 1.81. We'll see.
Scary, The logs show many instances of BMC IPMI Watchdog Timer Timeout and ASR's. Wonder how that effects the W2K8 64bit file systems. Fingers crossed.
There was a "temp solution" for RHEL of disabling ASR, but that defeats the purpose of ASR.
8i5
Advisor

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

We have the same issue on ilo2 firmware 1.79 and 1.81 plus latest psp8.30 and post 8.30 drivers.

Waiting for HP's answer and I suspect it will be "disable ASR".
Carol Northcut
New Member

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

We've been having the same issue with a DL380 G5 for several months. Have been on the phone with HP multiple times with three different case numbers! The occurrence is random, but the logs always record the same errors. ASRs have occurred three times in the last three days. With ASR disabled, the blue screen only says *** Hardware Malfunction Call your hardware vendor for support ***The system has halted***
Each time it ASRs, the iLO2 log shows "BMC IPMI Watchdog Timer" and "Server power removed" and "server power restored." The Integrated Management Log Viewer always shows "PCI Bus Error (Slot0, Bus0, Device 0, Function 0) and "ASR Detected by system ROM." The only thing disabling ASR does is prevent the server from resetting after failure. Not a good thing. We've replaced the system board, updated all the software/firmware multiple times (currently on iLO 1.81) and it's getting worse. This is a Win2K3 R2 SP2 64x server running Tivoli, connected to a library and multiple MSAs with P800 and P400. HP has asked that I next re-run the SmartStart diagnostics and run a repair of Windows. Aaargh!
Patrick Metcalfe
New Member

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

Had similiar issues - look at this link for Blade Servers probably works for DL380 Server also

http://h20000.www2.hp.com/bc/docs/support/SupportManual/c01723453/c01723453.pdf

http://h18004.www1.hp.com/products/blades/components/c-class.html

See also attached document


8i5
Advisor

Re: ILO2 firmware update caues BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

Those compatibility matrix docs are standard and if you don't follow them you will always be looking for trouble.

The latest I've been told by HP for my up to date blade servers which are experiencinig this issue is to downgrade the ilo management controller driver to 1.8.0.0 then upgrade back up to 1.13.0.0 again. I cannot imagine how this will help but will try it if the issue persists.