ProLiant Servers (ML,DL,SL)
1748069 Members
5299 Online
108758 Solutions
New Discussion юеВ

DL380 G5 / Linux RHEL 5 / Reboot by ASR

 
Yanick Quirion
Regular Advisor

DL380 G5 / Linux RHEL 5 / Reboot by ASR

Dear all,

Since couple of week I'm experiencing a strange problem on my brand new HP server. Its restart by itself at anytime. This happends 3 times so far (of course all time this happened, I was way of the office).

I'm running RHEL 5 on this system. Here are some lines I'v found in my /var/log/messages files:

Jun 11 16:26:54 callisto hpasmxld[5200]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!
Jun 11 16:27:04 callisto hpasmxld[5200]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!
Jun 11 16:27:14 callisto hpasmxld[5200]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!

Jun 11 16:27:24 callisto hpasmxld[5200]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!
Jun 11 16:27:24 callisto hpasmxld[5200]: iLO 2 Communications Error - Attempting synchronization!

Jun 11 16:28:09 callisto hpasmxld[5200]: iLO 2 has responded to reset request . . .

Jun 11 16:28:09 callisto hpasmxld[5200]: Stopping the Watchdog Timer . . .
Jun 11 16:28:09 callisto hpasmxld[5200]: Resetting Internal Data structures . . .
Jun 11 16:28:09 callisto hpasmxld[5200]: Initializing Internal Data structures from iLO 2. . .
Jun 11 16:28:09 callisto hpasmxld[5200]: The iLO 2 reset / synchronization has completed successfully
Jun 11 16:28:09 callisto kernel: hpasmxld[5200]: segfault at 0000000000000031 rip 0000000000000031 rsp 00007fffce427ab8 error 4

Couple minutes after the server restart itself and the operating system isn't freeze. I receive an alert on my e-mail:

Trap-ID=6025

An 'ASR Recover Complete' trap signifies that the system has been shutdown by the ASR feature and has just become operational again.

When I'm going to "System Management Homepage", on the logs I saw this:
ASR Detected by System ROM 5/27/2007 7:06AM 5/27/2007 7:06AM 1

and at the end, when I'm going to iLO-2 Log, I have:
Informational iLO 2 06/11/2007 16:38 06/11/2007 16:38 1 Server power restored.
Informational iLO 2 06/11/2007 16:38 06/11/2007 16:38 1 BMC IPMI Watchdog Timer Timeout: Action=System Power Reset.

So, based on those information, is somebody can tell me what's wrong with my system? All system state are "OK" or "GREEN" and HP support (yes I have a service contract) doesn't seems to be aware of that issue. They want me to run from SmartStart CD a diagnistics that may take up to three hours and now I'm 2000 miles away of my server.

If somebody can provide me some information about this, it will be really appreciated.

Best Regards,
Yanick
91 REPLIES 91
cosminidis
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Hi,

I have the same problem, no answer found yet. Everything looks right, no yellow / red leds, yet the server reboots randomly at least once a week. This is the only message in the ILO log. Have you found anything wrong with your server? Any answer would be much appreciated.

10x
Cosmin
Yanick Quirion
Regular Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Hi,

I'm happy to see that I'm not alove having that problem. At this time I only disable the ASR auto-reboot feature from System Management web page. I open a case at HP, and I need to run the diagnostic tools from smartstart CD. I did not have the time to run this tool yet, because this server is online 24/7. I plan to run it this weekend.

If I found something, I will let you know. Also, if you found something before me, please let me know.

Regards,
Yanick

Dynamic80
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

I have been getting the same message and reboots on our new ProLiant DL380 G5. I am searching for any updates and found a few tech papers but no solid answer. I hope you guys get it resolved.

Cheers
Fernando
Yanick Quirion
Regular Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Dear Fernando,

Unfortunately I did not find anyting solid. I just disabled ASR feature from System Management web page. Since then, the server has never rebooted itself and the OS never hang. I also runs all diagnistics from SmartStart CD and no error was reported.

If you get something else, please let me know.

Regards,
Yanick
Demond Reed
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

I have been having the same issue only with a blade server. We recently installed BL465C G1, I built the server running Windows Server 2003 Std R2. A few days afterwards I was informed that the server sporatically reboots itself. Upon investigating the issue I came upon BMC IPMI Watchdog Timer Timeout: Action=System Power Reset. I have been searching for a solution and have not found one yet. Has any one else?
Erik Weber
Occasional Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

I've just noticed the same thing on a ProLiant DL360 G5

Disabled ASR as others while waiting for a fix.

Log entries:
Jul 31 08:09:22 fri-ww01 kernel: ipmi_si(SI_CHECK_BMC): Failed to get Global Enables 0xc6.
Jul 31 08:09:32 fri-ww01 hpasmxld[4645]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!
Jul 31 08:09:42 fri-ww01 hpasmxld[4645]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!
Jul 31 08:09:52 fri-ww01 hpasmxld[4645]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!
Jul 31 08:10:02 fri-ww01 hpasmxld[4645]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!
Jul 31 08:10:02 fri-ww01 hpasmxld[4645]: iLO 2 Communications Error - Attempting synchronization!
Jul 31 08:10:47 fri-ww01 hpasmxld[4645]: iLO 2 has responded to reset request . . .
Jul 31 08:10:47 fri-ww01 hpasmxld[4645]: Stopping the Watchdog Timer . . .
Jul 31 08:10:47 fri-ww01 hpasmxld[4645]: Resetting Internal Data structures . . .
Jul 31 08:10:47 fri-ww01 hpasmxld[4645]: Initializing Internal Data structures from iLO 2. . .
Jul 31 08:10:47 fri-ww01 hpasmxld[4645]: The iLO 2 reset / synchronization has completed successfully
Jul 31 08:10:47 fri-ww01 kernel: hpasmxld[4645]: segfault at 0000000000000031 rip 0000000000000031 rsp 00007fff530279c8 error 4
Don Wycherley
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Just to say we have had the same. We have six DL380s running RHEL5 since June and one crashed and rebooted in September and another in October which is too unreliable for important databases. The key message is "Failed to get Global Enables" in the messages file. HP asked us to run the SmartStart diagnostics (5 loops) but it was only 12% finished after 1 hour and we had to stop it. They now advise us to disable ASR by rebooting and getting into the BIOS but we can't get downtime. Can it be done via ILO? We need a permanent fix which does not require ASR to be disabled.
Yanick Quirion
Regular Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Dear Don,

The ASR feature can be disable from System Management Homepage without having to reboot. I did that and since then the server never rebooted.

When loggon on to System Management, select "Autorecovery" feature under "Recovery" section then change the status to "disable" and click "set".

Your problem should now be solved.

Regards,
Yanick
David Claypool
Honored Contributor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Yanick, can you send me the case number you have open with HP Support? Email to simguru at hp dot com.