ProLiant Servers (ML,DL,SL)
1752429 Members
5661 Online
108788 Solutions
New Discussion юеВ

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

 
Nath Camelot
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Hello all,

I have the same exact problem with 4 brand new DL360G5 and 2 DL380G5 running RHEL5 x86_64.
Unexepected reboots occured (last one on last friday for one of the 360) on some of these servers: 3 of the 4 360 had this behaviour, 1 of the 2 380 too.
They all passed 72 hours of memtest86+ (v1.70) and 48 hours of hp diags (from smartstart CD 7.90) without problem before going to production, firmwares and packages are all up to date.
The following lines showed in /var/log/messages about 10 minutes before the ASR reboots the servers (last reboot for a 360) :
Oct 12 12:07:08 plam0043 kernel: ipmi_si(SI_CHECK_BMC): Failed to get Global Enables 0xc6.
Oct 12 12:07:18 plam0043 hpasmxld[5082]: OsKcsExecCmd: IPMI NetFN 0x6 CMD: 0x25 has timed out!
Oct 12 12:07:28 plam0043 hpasmxld[5082]: OsKcsExecCmd: IPMI NetFN 0x6 CMD: 0x25 has timed out!
Oct 12 12:07:38 plam0043 hpasmxld[5082]: OsKcsExecCmd: IPMI NetFN 0x6 CMD: 0x25 has timed out!
Oct 12 12:07:48 plam0043 hpasmxld[5082]: OsKcsExecCmd: IPMI NetFN 0x6 CMD: 0x25 has timed out!
Oct 12 12:07:48 plam0043 hpasmxld[5082]: iLO 2 Communications Error - Attempting synchronization!
Oct 12 12:08:33 plam0043 hpasmxld[5082]: iLO 2 has responded to reset request . . .
Oct 12 12:08:33 plam0043 hpasmxld[5082]: Stopping the Watchdog Timer . . .
Oct 12 12:08:33 plam0043 hpasmxld[5082]: Resetting Internal Data structures . . .
Oct 12 12:08:33 plam0043 hpasmxld[5082]: Initializing Internal Data structures from iLO 2. . .
Oct 12 12:08:33 plam0043 hpasmxld[5082]: The iLO 2 reset / synchronization has completed successfully
Oct 12 12:08:33 plam0043 kernel: hpasmxld[5082]: segfault at 0000000000010000 rip 0000000000010000 rsp 00007fff75dea648 error 4

A call is opened at hp europe.


Regards,
Nathana├Г┬лl
Dynamic80
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

After reading this particular forum, I went ahead and disabled it. The reboots have all together stopped. Until HP or others can figure out what is happening, I'd rather keep it off since this server of ours is already in production.

I ran the same tests and came up with nothing substantial. Everything is normal it seems.

Good luck.

Fernando
TODO Raymond
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Hi evererybody,

I think HP must take time seriously to learn about this issue because it came to be very frequent.

I have the same issue about my 2 servers DL380G5 wich run Windows 2003.

Let us share our experience about this issue if anybody have the solution.

Cheers

Raymond
netway
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

I have now the same issue with an BL460c. case is open.
Bruce Bott
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Same problem... new DL380 G5 spontaneously rebooting.

Disabled ASR to see what happens. Interestingly though, in the management console the ASR 'log' showed no ASR events.
Avtar_1
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

I'm experiencing the same issue with a brand new DL360 G5 running Red Hat Enterprise 5. Same errors in /var/log/messages as well. The server isn't in production but we were taxing it quite a bit. Contacting HP Support today.
kmallea
New Member

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Interesting. I am experience the same issue on two, brand new DL380 G5 servers. After reading this post, I've gone ahead and disabled ASR.

Has HP been able to give you guys a complete fix or do you all still have open cases?
Yanick Quirion
Regular Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

I have an open case since June and I did not hear anything back from HP. I don't know if the case is still on their queue.

Yanick

Ugo Bellavance (ATQ)
Frequent Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Hello all,

I have the very same issue on 4 brand new DL380 G5 in production.

All servers are running Red Hat Enterprise Linux 5, latest patches and updates (RHEL 5.1 now since a couple of days).

I also disabled the ASR because all 4 servers are production Oracle Databases and they keep crashing every 18 hours or so.

HP definitely needs to fix this ASAP, has anybody got a fix yet on this yet ?

Here's what you can see in the ILO2 log (from most recent to oldest, that is you get the BMC error first then it reset itself):

---
Informational iLO 2 11/11/2007 13:16 Server power restored.

Informational iLO 2 11/11/2007 13:15 Server power removed.

Informational iLO 2 11/11/2007 13:15 BMC IPMI Watchdog Timer Timeout: Action=System Power Reset.
---

Patrick Monfette
Ugo Bellavance (ATQ)
Frequent Advisor

Re: DL380 G5 / Linux RHEL 5 / Reboot by ASR

Here's the kind of information you can get in the /var/log/messages once you disable the ASR (if you don't, the system resets itself before it has time to write those)

Nov 12 10:00:55 bdprod-1 kernel: ipmi_si(SI_CHECK_BMC): Failed to get Global Enables 0xc6.
Nov 12 10:01:05 bdprod-1 hpasmxld[7373]: OsKcsExecCmd: IPMI NetFN 0x4 CMD: 0x2d has timed out!
Nov 12 10:01:15 bdprod-1 hpasmxld[7373]: OsKcsExecCmd: IPMI NetFN 0x4 CMD: 0x2d has timed out!
Nov 12 10:01:25 bdprod-1 hpasmxld[7373]: OsKcsExecCmd: IPMI NetFN 0x4 CMD: 0x2d has timed out!
Nov 12 10:01:35 bdprod-1 hpasmxld[7373]: OsKcsExecCmd: IPMI NetFN 0x4 CMD: 0x2d has timed out!
Nov 12 10:01:35 bdprod-1 hpasmxld[7373]: iLO 2 Communications Error - Attempting synchronization!
Nov 12 10:02:20 bdprod-1 hpasmxld[7373]: iLO 2 has responded to reset request . . .
Nov 12 10:02:20 bdprod-1 hpasmxld[7373]: Stopping the Watchdog Timer . . .
Nov 12 10:02:20 bdprod-1 hpasmxld[7373]: Resetting Internal Data structures . . .
Nov 12 10:02:20 bdprod-1 hpasmxld[7373]: Initializing Internal Data structures from iLO 2. . .
Nov 12 10:02:20 bdprod-1 hpasmxld[7373]: The iLO 2 reset / synchronization has completed successfully
Nov 12 10:02:20 bdprod-1 hpasmxld[7373]: Failed GET SENSOR READING, sensor 9
Nov 12 10:02:20 bdprod-1 hpasmxld[7373]: iLO 2 Communications Error - Attempting synchronization!
Nov 12 10:03:05 bdprod-1 hpasmxld[7373]: iLO 2 has responded to reset request . . .
Nov 12 10:03:05 bdprod-1 hpasmxld[7373]: Stopping the Watchdog Timer . . .
Nov 12 10:03:05 bdprod-1 hpasmxld[7373]: Resetting Internal Data structures . . .
Nov 12 10:03:05 bdprod-1 hpasmxld[7373]: Initializing Internal Data structures from iLO 2. . .
Nov 12 10:03:05 bdprod-1 hpasmxld[7373]: The iLO 2 reset / synchronization has completed successfully


Patrick Monfette