ProLiant Servers (ML,DL,SL)
1753912 Members
9038 Online
108810 Solutions
New Discussion юеВ

Re: ASR error on DL G5 series BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

 
Iain Binnie
Advisor

ASR error on DL G5 series BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

Hi,

This issue involves several DL380 and DL360 G5 servers.

I recently updated the ILO2 firmware to version 1.79 and also the HP System Management Homepage to v3.0.2.77 on several DL380 and Dl360 Servers.

Since doing so I have had sporadic power up and down issues and when looking in the ILO2 log I receive this message each time before the power cycle "BMC IPMI Watchdog Timer Timeout: Action=System Power Reset"

I have dug around online and find various threads leading nowhere specifically all indicating that HP are working on this then ending with a disable ASR as a fix in-between, which I have done and it appears to have stopped the reboots.

My questions are 1. What is the issue I am experiencing and how do I fix it ?

2. What exactly are ASR and BMC IPMI Watchdog Timer? And what are they used for?

Your help as ever is greatly appreciated!

Best Regards
Iain
6 REPLIES 6
Matti_Kurkela
Honored Contributor

Re: ASR error on DL G5 series BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

ASR = Automatic System Reboot. It is a watchdog timer, just like the BMC IPMI watchdog timer.

When a watchdog timer is started, it is initialized with some time value (for ASR, the factory default is 10 minutes). The timer starts ticking down from 10 minutes. If it reaches zero, the timer hardware will trigger recovery actions, such as a system reset.

Normally, some piece of software (in HP Proliants, it's usually a part of hardware monitoring drivers) periodically resets the watchdog timer to the initial value so that it never reaches zero. But if something prevents the software from running, the timer will eventually reach zero, and the system will reboot.

This is very useful if the server is in a remote location: if the OS is hung, the watchdog will reboot it. If the boot allows the sysadmin to login remotely, there is no need to send a junior sysadmin (or any technical personnel) onsite just to press the reset button.

Of course, rebooting the server removes any error messages that might have been visible on the screen when the system was hung. If there is nothing about the crash in the system logs and the problem seems to be repeating, you have the option of disabling the watchdog timer(s) so that you will have a chance to examine the machine in its hung state.

Alternatively, I seem to recall that some of the newer ILO2 versions have ways to capture the system display when the watchdog is triggered. If your systems have that functionality, consider setting it up for use.

MK
MK
Iain Binnie
Advisor

Re: ASR error on DL G5 series BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

Hi I apreciate your explination of Watchdog timer etc and it is now a bit clearer in my head! But it still does not ecplain the issues that I am haveing since updateing the firmware and homepage.

I feel that it is more than coincidence that sice updateing the 2 items of 3 servers that all of them have power cycled due to the BMC IPMI timer, all of these servers are W2003 and 2 of them are enterprise and clustered with no unscheduled downtime in over a year untill I updated these 3 items and then they both behave wperadically?
acartes
Honored Contributor

Re: ASR error on DL G5 series BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

The problem solution requires both:
- update to iLO 2 v1.78 or later
- update the Windows Management Controller Driver to 1.11.2.0 or later

Updating one or the other is not a complete fix.

Discussed in this Customer Advisory:
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?locale=en_US&objectID=c01802766
Bj├╢rn Kober
New Member

Re: ASR error on DL G5 series BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

Hi,


maybe i'm a few moments to fast with my suspicion.

I installed the iLo2 firmware v1.79 on a DL 360 G5, to resolve an iLo2 interface error.
After this the server reboots every second day with an ASR.

To get a reason and a fast solution, I've contacted the hp support, but the amound of updates I should install on his server didn't help.

Yesteray I've installed the iLo firmware version on a DL 365 G1, also to solve an interface error. A few hours later, the server got an ASR, with the same identifications the DL 360 G5 has.

I think the problem is in this firmware version.

Actually I've send this suspicion to hp and now I'm waiting for response.

Is there anybody who can confirm my suspicion?

Bj├Г┬╢rn.
MKemp_1
New Member

Re: ASR error on DL G5 series BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

I have also begun experiencing this on a BL480c G1. I have iLO2 firmware version 1.80, iLO 2 Management Controller Driver for Windows Server 2003 x64 Editions version 1.8.0.0 and Integrated Lights-Out Management Interface Driver for Windows Server 2003/2008 x64 Editions is at version 1.13.0.0
It appears that even after almost 2 years of this HP still does not have it fixed. I have seen recommendations to downgrade the Integrated Lights-Out Management Interface Driver for Windows Server 2003/2008 x64 Editions to version 1.8.3790.0 which is from 2006. Not ready to do that yet. Disabling ASR is really not either an interim or permanent fix as far as I am concerned, it's just a bandaid with hope that this would all go away.
8i5
Advisor

Re: ASR error on DL G5 series BMC IPMI Watchdog Timer Timeout: Action=System Power Reset

We're seeing the same on BL465c G5 running Windows 2003 64 bit. All latest firmware and drivers/software.

This is very frustrating - last time I opened a ticket for this HP's stock answer of update firmware and drivers was the suggestion.

The strange thing is this is very intermittent and unpredictable- we have many systems identically configured - all experiencing the same issue at different rates/times.