HPE 9000 and HPE e3000 Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

error log in event,log for partition reseting

 
CIS Ethiopia Ziad
Occasional Visitor

error log in event,log for partition reseting

please find below an error message we have got in one HP-UX server. Please can you advise what is the problem exactly and the recommended solution. Thanks for your urgent reply.


# /opt/resmon/bin/resdata -R 299696130 -r /system/events/ipmi_fpl/ipmi_fpl -n 299696134 -a

CURRENT MONITOR DATA:

Event Time..........: Mon Dec 17 02:45:54 2007
Severity............: CRITICAL
Monitor.............: fpl_em
Event #.............: 646
System..............: scp5

Summary:
Partition being reset due to watchdog timeout expiring


Description of Error:

The partition is being reset because its watchdog timer expired and automatic
restart is enabled.

Probable Cause / Recommended Action:

Cause: There are 2 watchdog mechanisms, both of which trigger the MP to reset a
partition if its OS becomes unresponsive. An unresponsive OS is detected when
the OS fails to refresh the watchdog timer before it expires. PA systems
refresh the watchdog timer by emitting an event with data field set to activity
level/timeout, and the timeout fields specifies the desired timeout. This timer
can be disabled with the MP AR command. IPF systems refresh the watchdog timer
using the IPMI clear watchdog command. The AR command does not affect the IPMI
watchdog timer. Regardless of which timer was in use, the MP emits this event
when timer expiration triggers resetting the partition. Action: Find out why
the partition's OS had hung. The cause could be bad HW that crashed the
partition, or in rare cases, a combination of events that caused the OS to be
unable to refresh the watchdog timer. Look for other events preceeding the
timeout for clues to the root cause of the partition being unresponsive.
-


Additional Event Data:
System IP Address...: 129.10.168.61
Event Id............: 0x4765b8b200000000
Monitor Version.....: A.01.00
Event Class.........: System
Client Configuration File...........:
/var/stm/config/tools/monitor/default_fpl_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/rp7420
EMS Version.....................: A.04.00
STM Version.....................: A.43.00
System Serial Number............: SGH4504794
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/fpl_em.htm#646

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v


IPMI event hex: f6800ad500e00000 0
Time Stamp: Sun Dec 16 22:57:51 2007
Event keyword: WATCHDOG_RESET_PARTITION
Alert level name: Fatal
Reporting vers: 1
Data field type: Implementation dependent data field
Decoded data field:
Reporting entity ID: 1
Reporting entity Full Name: Service Processor
IPMI Event ID : 2773

4 REPLIES 4
Andrew Merritt_2
Honored Contributor

Re: error log in event,log for partition reseting

Hi,
I believe this problem can be caused by old firmware. Make sure the firmware on your system is up-to-date.

...
ServiceNote A7025A-01
PDHC Revision A.003.014 contains the following fixes:

PDHC could hang resulting in OS heartbeats not being passed to the MP.
After 10 minutes of not receiving heartbeats, the MP would emit an IPMI
Event (WATCHDOG_RESET_PARTITION). This event causes the EMS Event
Monitoring System to send e-mail stating the system has been reset
(EMS Event #646). PDHC firmware has been modified to resolve this issue.
...

Also, you have OnlineDiags A.43.00, which was the December 2003 release. You should update to a supported version as soon as possible.

Versions here - http://www.docs.hp.com/en/diag/stm/stm_upd.htm#table

Download from here - http://h20293.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=B6191AAE

Andrew
ziad_1
Frequent Advisor

Re: error log in event,log for partition reseting

Hi Andrew,

Thanks for your reply. I think you gave me the site for STM software and not firmware update. Please can you advise the procedure step by step how to upgrade the system firmware and how to find the latest firmware for this server. Thanks in advance for your support.
ziad
Andrew Merritt_2
Honored Contributor

Re: error log in event,log for partition reseting

Yes, there's a reason for that - I'm not an expert on HW or FW :-)

Searching www.hp.com suggests this may be the place to start - http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&swItem=pf-53620-2&jumpid=reg_R1002_USEN

If you don't know how to upgrade the firmware, I would strongly recommend contacting HP support.

Andrew
Torsten.
Acclaimed Contributor

Re: error log in event,log for partition reseting

Be very, very careful with firmware updates on cell-based servers!

If you do one wrong step you can easily cause serious damage on your system!

In case of any doubt, open a support case and ask for help!

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!