HPE 9000 and HPE e3000 Servers
error log in event,log for partition reseting

CIS Ethiopia Ziad
please find below an error message we have got in one HP-UX server. Please can you advise what is the problem exactly and the recommended solution. Thanks for your urgent reply.

# /opt/resmon/bin/resdata -R 299696130 -r /system/events/ipmi_fpl/ipmi_fpl -n 299696134 -a


Event Time..........: Mon Dec 17 02:45:54 2007
Severity............: CRITICAL
Monitor.............: fpl_em
Event #.............: 646
System..............: scp5

Partition being reset due to watchdog timeout expiring

Description of Error:

The partition is being reset because its watchdog timer expired and automatic
restart is enabled.

Probable Cause / Recommended Action:

Cause: There are 2 watchdog mechanisms, both of which trigger the MP to reset a
partition if its OS becomes unresponsive. An unresponsive OS is detected when
the OS fails to refresh the watchdog timer before it expires. PA systems
refresh the watchdog timer by emitting an event with data field set to activity
level/timeout, and the timeout fields specifies the desired timeout. This timer
can be disabled with the MP AR command. IPF systems refresh the watchdog timer
using the IPMI clear watchdog command. The AR command does not affect the IPMI
watchdog timer. Regardless of which timer was in use, the MP emits this event
when timer expiration triggers resetting the partition. Action: Find out why
the partition's OS had hung. The cause could be bad HW that crashed the
partition, or in rare cases, a combination of events that caused the OS to be
unable to refresh the watchdog timer. Look for other events preceeding the
timeout for clues to the root cause of the partition being unresponsive.

Additional Event Data:
System IP Address...:
Event Id............: 0x4765b8b200000000
Monitor Version.....: A.01.00
Event Class.........: System
Client Configuration File...........:
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
Additional System Data:
System Model Number.............: 9000/800/rp7420
EMS Version.....................: A.04.00
STM Version.....................: A.43.00
System Serial Number............: SGH4504794
Latest information on this event:

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v

IPMI event hex: f6800ad500e00000 0
Time Stamp: Sun Dec 16 22:57:51 2007
Alert level name: Fatal
Reporting vers: 1
Data field type: Implementation dependent data field
Decoded data field:
Reporting entity ID: 1
Reporting entity Full Name: Service Processor
IPMI Event ID : 2773

Andrew Merritt_2
I believe this problem can be caused by old firmware. Make sure the firmware on your system is up-to-date.

ServiceNote A7025A-01
PDHC Revision A.003.014 contains the following fixes:

PDHC could hang resulting in OS heartbeats not being passed to the MP.
After 10 minutes of not receiving heartbeats, the MP would emit an IPMI
Event (WATCHDOG_RESET_PARTITION). This event causes the EMS Event
Monitoring System to send e-mail stating the system has been reset
(EMS Event #646). PDHC firmware has been modified to resolve this issue.

Also, you have OnlineDiags A.43.00, which was the December 2003 release. You should update to a supported version as soon as possible.

Versions here - http://www.docs.hp.com/en/diag/stm/stm_upd.htm#table

Download from here - http://h20293.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=B6191AAE

Hi Andrew,

Thanks for your reply. I think you gave me the site for STM software and not firmware update. Please can you advise the procedure step by step how to upgrade the system firmware and how to find the latest firmware for this server. Thanks in advance for your support.
Andrew Merritt_2
Yes, there's a reason for that - I'm not an expert on HW or FW :-)

Searching www.hp.com suggests this may be the place to start - http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&swItem=pf-53620-2&jumpid=reg_R1002_USEN

If you don't know how to upgrade the firmware, I would strongly recommend contacting HP support.

Be very, very careful with firmware updates on cell-based servers!

If you do one wrong step you can easily cause serious damage on your system!

In case of any doubt, open a support case and ask for help!

Hope this helps!

