ProLiant Servers (ML,DL,SL)
1822946 Members
3712 Online
109645 Solutions
New Discussion юеВ

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

 
Yanick Quirion
Regular Advisor

DL380 G5 / Linux RHEL 5 / Reboot by ASR

Dear all,

Since couple of week I'm experiencing a strange problem on my brand new HP server. Its restart by itself at anytime. This happends 3 times so far (of course all time this happened, I was way of the office).

I'm running RHEL 5 on this system. Here are some lines I'v found in my /var/log/messages files:

Jun 11 16:26:54 callisto hpasmxld[5200]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!
Jun 11 16:27:04 callisto hpasmxld[5200]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!
Jun 11 16:27:14 callisto hpasmxld[5200]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!

Jun 11 16:27:24 callisto hpasmxld[5200]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!
Jun 11 16:27:24 callisto hpasmxld[5200]: iLO 2 Communications Error - Attempting synchronization!

Jun 11 16:28:09 callisto hpasmxld[5200]: iLO 2 has responded to reset request . . .

Jun 11 16:28:09 callisto hpasmxld[5200]: Stopping the Watchdog Timer . . .
Jun 11 16:28:09 callisto hpasmxld[5200]: Resetting Internal Data structures . . .
Jun 11 16:28:09 callisto hpasmxld[5200]: Initializing Internal Data structures from iLO 2. . .
Jun 11 16:28:09 callisto hpasmxld[5200]: The iLO 2 reset / synchronization has completed successfully
Jun 11 16:28:09 callisto kernel: hpasmxld[5200]: segfault at 0000000000000031 rip 0000000000000031 rsp 00007fffce427ab8 error 4

Couple minutes after the server restart itself and the operating system isn't freeze. I receive an alert on my e-mail:

Trap-ID=6025

An 'ASR Recover Complete' trap signifies that the system has been shutdown by the ASR feature and has just become operational again.

When I'm going to "System Management Homepage", on the logs I saw this:
ASR Detected by System ROM 5/27/2007 7:06AM 5/27/2007 7:06AM 1

and at the end, when I'm going to iLO-2 Log, I have:
Informational iLO 2 06/11/2007 16:38 06/11/2007 16:38 1 Server power restored.
Informational iLO 2 06/11/2007 16:38 06/11/2007 16:38 1 BMC IPMI Watchdog Timer Timeout: Action=System Power Reset.

So, based on those information, is somebody can tell me what's wrong with my system? All system state are "OK" or "GREEN" and HP support (yes I have a service contract) doesn't seems to be aware of that issue. They want me to run from SmartStart CD a diagnistics that may take up to three hours and now I'm 2000 miles away of my server.

If somebody can provide me some information about this, it will be really appreciated.

Best Regards,
Yanick
91 REPLIES 91
cosminidis
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Hi,

I have the same problem, no answer found yet. Everything looks right, no yellow / red leds, yet the server reboots randomly at least once a week. This is the only message in the ILO log. Have you found anything wrong with your server? Any answer would be much appreciated.

10x
Cosmin
Yanick Quirion
Regular Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Hi,

I'm happy to see that I'm not alove having that problem. At this time I only disable the ASR auto-reboot feature from System Management web page. I open a case at HP, and I need to run the diagnostic tools from smartstart CD. I did not have the time to run this tool yet, because this server is online 24/7. I plan to run it this weekend.

If I found something, I will let you know. Also, if you found something before me, please let me know.

Regards,
Yanick

Dynamic80
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

I have been getting the same message and reboots on our new ProLiant DL380 G5. I am searching for any updates and found a few tech papers but no solid answer. I hope you guys get it resolved.

Cheers
Fernando
Yanick Quirion
Regular Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Dear Fernando,

Unfortunately I did not find anyting solid. I just disabled ASR feature from System Management web page. Since then, the server has never rebooted itself and the OS never hang. I also runs all diagnistics from SmartStart CD and no error was reported.

If you get something else, please let me know.

Regards,
Yanick
Demond Reed
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

I have been having the same issue only with a blade server. We recently installed BL465C G1, I built the server running Windows Server 2003 Std R2. A few days afterwards I was informed that the server sporatically reboots itself. Upon investigating the issue I came upon BMC IPMI Watchdog Timer Timeout: Action=System Power Reset. I have been searching for a solution and have not found one yet. Has any one else?
Erik Weber
Occasional Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

I've just noticed the same thing on a ProLiant DL360 G5

Disabled ASR as others while waiting for a fix.

Log entries:
Jul 31 08:09:22 fri-ww01 kernel: ipmi_si(SI_CHECK_BMC): Failed to get Global Enables 0xc6.
Jul 31 08:09:32 fri-ww01 hpasmxld[4645]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!
Jul 31 08:09:42 fri-ww01 hpasmxld[4645]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!
Jul 31 08:09:52 fri-ww01 hpasmxld[4645]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!
Jul 31 08:10:02 fri-ww01 hpasmxld[4645]: OsKcsExecCmd: IPMI NetFN 0x36 CMD: 0x2 has timed out!
Jul 31 08:10:02 fri-ww01 hpasmxld[4645]: iLO 2 Communications Error - Attempting synchronization!
Jul 31 08:10:47 fri-ww01 hpasmxld[4645]: iLO 2 has responded to reset request . . .
Jul 31 08:10:47 fri-ww01 hpasmxld[4645]: Stopping the Watchdog Timer . . .
Jul 31 08:10:47 fri-ww01 hpasmxld[4645]: Resetting Internal Data structures . . .
Jul 31 08:10:47 fri-ww01 hpasmxld[4645]: Initializing Internal Data structures from iLO 2. . .
Jul 31 08:10:47 fri-ww01 hpasmxld[4645]: The iLO 2 reset / synchronization has completed successfully
Jul 31 08:10:47 fri-ww01 kernel: hpasmxld[4645]: segfault at 0000000000000031 rip 0000000000000031 rsp 00007fff530279c8 error 4
Don Wycherley
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Just to say we have had the same. We have six DL380s running RHEL5 since June and one crashed and rebooted in September and another in October which is too unreliable for important databases. The key message is "Failed to get Global Enables" in the messages file. HP asked us to run the SmartStart diagnostics (5 loops) but it was only 12% finished after 1 hour and we had to stop it. They now advise us to disable ASR by rebooting and getting into the BIOS but we can't get downtime. Can it be done via ILO? We need a permanent fix which does not require ASR to be disabled.
Yanick Quirion
Regular Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Dear Don,

The ASR feature can be disable from System Management Homepage without having to reboot. I did that and since then the server never rebooted.

When loggon on to System Management, select "Autorecovery" feature under "Recovery" section then change the status to "disable" and click "set".

Your problem should now be solved.

Regards,
Yanick
David Claypool
Honored Contributor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Yanick, can you send me the case number you have open with HP Support? Email to simguru at hp dot com.
Nath Camelot
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Hello all,

I have the same exact problem with 4 brand new DL360G5 and 2 DL380G5 running RHEL5 x86_64.
Unexepected reboots occured (last one on last friday for one of the 360) on some of these servers: 3 of the 4 360 had this behaviour, 1 of the 2 380 too.
They all passed 72 hours of memtest86+ (v1.70) and 48 hours of hp diags (from smartstart CD 7.90) without problem before going to production, firmwares and packages are all up to date.
The following lines showed in /var/log/messages about 10 minutes before the ASR reboots the servers (last reboot for a 360) :
Oct 12 12:07:08 plam0043 kernel: ipmi_si(SI_CHECK_BMC): Failed to get Global Enables 0xc6.
Oct 12 12:07:18 plam0043 hpasmxld[5082]: OsKcsExecCmd: IPMI NetFN 0x6 CMD: 0x25 has timed out!
Oct 12 12:07:28 plam0043 hpasmxld[5082]: OsKcsExecCmd: IPMI NetFN 0x6 CMD: 0x25 has timed out!
Oct 12 12:07:38 plam0043 hpasmxld[5082]: OsKcsExecCmd: IPMI NetFN 0x6 CMD: 0x25 has timed out!
Oct 12 12:07:48 plam0043 hpasmxld[5082]: OsKcsExecCmd: IPMI NetFN 0x6 CMD: 0x25 has timed out!
Oct 12 12:07:48 plam0043 hpasmxld[5082]: iLO 2 Communications Error - Attempting synchronization!
Oct 12 12:08:33 plam0043 hpasmxld[5082]: iLO 2 has responded to reset request . . .
Oct 12 12:08:33 plam0043 hpasmxld[5082]: Stopping the Watchdog Timer . . .
Oct 12 12:08:33 plam0043 hpasmxld[5082]: Resetting Internal Data structures . . .
Oct 12 12:08:33 plam0043 hpasmxld[5082]: Initializing Internal Data structures from iLO 2. . .
Oct 12 12:08:33 plam0043 hpasmxld[5082]: The iLO 2 reset / synchronization has completed successfully
Oct 12 12:08:33 plam0043 kernel: hpasmxld[5082]: segfault at 0000000000010000 rip 0000000000010000 rsp 00007fff75dea648 error 4

A call is opened at hp europe.


Regards,
Nathana├Г┬лl
Dynamic80
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

After reading this particular forum, I went ahead and disabled it. The reboots have all together stopped. Until HP or others can figure out what is happening, I'd rather keep it off since this server of ours is already in production.

I ran the same tests and came up with nothing substantial. Everything is normal it seems.

Good luck.

Fernando
TODO Raymond
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Hi evererybody,

I think HP must take time seriously to learn about this issue because it came to be very frequent.

I have the same issue about my 2 servers DL380G5 wich run Windows 2003.

Let us share our experience about this issue if anybody have the solution.

Cheers

Raymond
netway
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

I have now the same issue with an BL460c. case is open.
Bruce Bott
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Same problem... new DL380 G5 spontaneously rebooting.

Disabled ASR to see what happens. Interestingly though, in the management console the ASR 'log' showed no ASR events.
Avtar_1
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

I'm experiencing the same issue with a brand new DL360 G5 running Red Hat Enterprise 5. Same errors in /var/log/messages as well. The server isn't in production but we were taxing it quite a bit. Contacting HP Support today.
kmallea
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Interesting. I am experience the same issue on two, brand new DL380 G5 servers. After reading this post, I've gone ahead and disabled ASR.

Has HP been able to give you guys a complete fix or do you all still have open cases?
Yanick Quirion
Regular Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

I have an open case since June and I did not hear anything back from HP. I don't know if the case is still on their queue.

Yanick

Ugo Bellavance (ATQ)
Frequent Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Hello all,

I have the very same issue on 4 brand new DL380 G5 in production.

All servers are running Red Hat Enterprise Linux 5, latest patches and updates (RHEL 5.1 now since a couple of days).

I also disabled the ASR because all 4 servers are production Oracle Databases and they keep crashing every 18 hours or so.

HP definitely needs to fix this ASAP, has anybody got a fix yet on this yet ?

Here's what you can see in the ILO2 log (from most recent to oldest, that is you get the BMC error first then it reset itself):

---
Informational iLO 2 11/11/2007 13:16 Server power restored.

Informational iLO 2 11/11/2007 13:15 Server power removed.

Informational iLO 2 11/11/2007 13:15 BMC IPMI Watchdog Timer Timeout: Action=System Power Reset.
---

Patrick Monfette
Ugo Bellavance (ATQ)
Frequent Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Here's the kind of information you can get in the /var/log/messages once you disable the ASR (if you don't, the system resets itself before it has time to write those)

Nov 12 10:00:55 bdprod-1 kernel: ipmi_si(SI_CHECK_BMC): Failed to get Global Enables 0xc6.
Nov 12 10:01:05 bdprod-1 hpasmxld[7373]: OsKcsExecCmd: IPMI NetFN 0x4 CMD: 0x2d has timed out!
Nov 12 10:01:15 bdprod-1 hpasmxld[7373]: OsKcsExecCmd: IPMI NetFN 0x4 CMD: 0x2d has timed out!
Nov 12 10:01:25 bdprod-1 hpasmxld[7373]: OsKcsExecCmd: IPMI NetFN 0x4 CMD: 0x2d has timed out!
Nov 12 10:01:35 bdprod-1 hpasmxld[7373]: OsKcsExecCmd: IPMI NetFN 0x4 CMD: 0x2d has timed out!
Nov 12 10:01:35 bdprod-1 hpasmxld[7373]: iLO 2 Communications Error - Attempting synchronization!
Nov 12 10:02:20 bdprod-1 hpasmxld[7373]: iLO 2 has responded to reset request . . .
Nov 12 10:02:20 bdprod-1 hpasmxld[7373]: Stopping the Watchdog Timer . . .
Nov 12 10:02:20 bdprod-1 hpasmxld[7373]: Resetting Internal Data structures . . .
Nov 12 10:02:20 bdprod-1 hpasmxld[7373]: Initializing Internal Data structures from iLO 2. . .
Nov 12 10:02:20 bdprod-1 hpasmxld[7373]: The iLO 2 reset / synchronization has completed successfully
Nov 12 10:02:20 bdprod-1 hpasmxld[7373]: Failed GET SENSOR READING, sensor 9
Nov 12 10:02:20 bdprod-1 hpasmxld[7373]: iLO 2 Communications Error - Attempting synchronization!
Nov 12 10:03:05 bdprod-1 hpasmxld[7373]: iLO 2 has responded to reset request . . .
Nov 12 10:03:05 bdprod-1 hpasmxld[7373]: Stopping the Watchdog Timer . . .
Nov 12 10:03:05 bdprod-1 hpasmxld[7373]: Resetting Internal Data structures . . .
Nov 12 10:03:05 bdprod-1 hpasmxld[7373]: Initializing Internal Data structures from iLO 2. . .
Nov 12 10:03:05 bdprod-1 hpasmxld[7373]: The iLO 2 reset / synchronization has completed successfully


Patrick Monfette
kmallea
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Here's an interesting bit if information: people suggested disabling ASR via the System Management Console, which I did. However, when I went into the BIOS I noticed that ASR was already disabled by default (at least on my systems), yet the logs report "ASR detected by System ROM", and some times "Blue Screen Traps."

And this is happening on two new servers, one running MS SQL 2005 and another running Sharepoint 2007.
Ugo Bellavance (ATQ)
Frequent Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

That is weird. ASR is usually enabled by default if I remember well.

For my part, I disabled it (ASR) using the system management homepage and since then, I haven't got any reboot. I can check in the BIOS though because those are live servers and I can't restart them.

However, IPMI did fail many times since then and wanted the server to reboot because of that (I can see that in the iLO logs). But since it is disabled, it only seems to restart IPMI and continue on, I wonder why it doesn't do that by default instead of rebooting.

So the whole thing is still bugged but at least, I can use my server without having them rebooting every now and then.

It is very bad though that I had to disable the ASR, I really need this feature working perfectly, especially for my disaster recovery site.

I know I am repeating myself but HP really needs to fix this rapidly.

Patrick Monfette
Rabie Van der Merwe
Occasional Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

I am also having spurious ASR's and have since disabled ASR on some of my servers.

I am running on DL 380 G5's with:
CentOS 5 x86_64 (Xen kernel)
RHEL 5 x86_64
CentOS 4 i386 (Asterisk server)

All servers are on HP PSP 7.90 with the latest firmware versions ( as of 7.9.0 Firmware update CD)

I will also be logging a call shortly.

SYS_MONEXT
Occasional Contributor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Hi,

I had the same problem on a DL360G5, running RHEL5, with hp-OpenIPMI-7.8.0-83.rhel5.

HP support Level 2 has recommended these actions (in my case, after gathering information on my systems with cfg2html) :

1) rpm -e hp-OpenIPMI (this can take some time to stop the service and uninstall)
2) chkconfig --level 35 ipmi on (this sets redhat's native ipmi deamon to start in run level 3 and 5)
3 ) service hpasm restart (this will stop all snmp agents and restart hpasm with hpasmlited service to use redhat native IPMI)

Then you can reactivate ASR if you disabled it to prevent reboot.
TODO Raymond
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Hi everybody,

The solution of Bernard is applied to RHEL, do someone has the solution for WIndows environnement?

Regards