Operating System - HP-UX
1832856 Members
3142 Online
110047 Solutions
New Discussion

Re: rx6600: EMS notifications

 
Fayez
Trusted Contributor

rx6600: EMS notifications

Hi all,

OS : HP-UX ver3, it keep restart about every 2 days, and this is what we got from mail system:

From root@db2serv.ideco.local Mon Jun 16 07:06:50 GMT 2008
Received: (from root@localhost)
st August,2006/8.13.3) id m5GA6owG007969;
Mon, 16 Jun 2008 07:06:50 -0300 (GMT)
Date: Mon, 16 Jun 2008 07:06:50 -0300 (GMT)
Message-Id: <200806161006.m5GA6owG007969@db2serv.ideco.local>
To: root@db2serv.ideco.local
From: root@db2serv.ideco.local
Subject: db2serv: Event Monitor Notification
Content-Length: 2364
Status: RO

>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Mon Jun 16 07:06:50 2008

db2serv sent Event Monitor notification information:

/system/events/ipmi_fpl/ipmi_fpl is >= 3.
Its current value is CRITICAL(5).



Event data from monitor:

Event Time..........: Mon Jun 16 07:06:50 2008
Severity............: CRITICAL
Monitor.............: fpl_em
Event #.............: 6772
System..............: db2serv.ideco.local

Summary:
HP-UX OS shutdown due to an MCA or INIT


Description of Error:

An OS is shutting down due to an MCA (Machine Check Abort) or INIT.

Probable Cause / Recommended Action:


An MCA or INIT occurred.
Analyze the dump & logs for cause. If necessary contact HP Support for
assistance.



Additional Event Data:
System IP Address...: 128.127.1.32
Event Id............: 0x48563b3a00000003
Monitor Version.....: A.01.00
Event Class.........: System
Client Configuration File...........:
/var/stm/config/tools/monitor/default_fpl_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: ia64 hp server rx6600
EMS Version.....................: A.04.20
STM Version.....................: D.02.00
System Serial Number............: DEH4748378
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/fpl_em.htm#6772

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v


IPMI event hex: 0xf4801c3100e00090 0x000000000019100c
Time Stamp: Mon Jun 16 09:52:00 2008
Event keyword: HP-UX_OS_CRITICAL_SHUTDOWN
Alert level name: Fatal
Reporting vers:

Data field type: Major change in system state
Decoded data field: System State = 12(State Change)
State Change Event = 25(Reserved)
LED Command Valid = 0(LED state is not updated)
LED Run = 0(off (default))
LED Attention = 0(reserved)
LED Stopped = 0(off (default))
Reporting entity ID: 0 ( Cab 0 Cell 0 CPU 0 )
Reporting entity Full Name: HP-UX Kernel
IPMI Event ID : 7217 (0x1c31)

any help to understand what happing here
8 REPLIES 8
Dennis Handly
Acclaimed Contributor

Re: rx6600: EMS notifications

A MCA is a hardware error:
An OS is shutting down due to an MCA (Machine Check Abort) or INIT.

You need to contact the Response Center.
Steven E. Protter
Exalted Contributor

Re: rx6600: EMS notifications

Shalom,

This is not the famous HPMC, High Priority Machine Check which crashes a system upon loss of a CPU.

It is however a pretty nasty hardware failure situation and I agree the hardware team needs to be brought in.

This is not an error you can ignore.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Dennis Handly
Acclaimed Contributor

Re: rx6600: EMS notifications

>SEP: This is not the famous HPMC, High Priority Machine Check which crashes a system upon loss of a CPU.

A MCA is the IPF equivalent of a HPMC. Though it may be more recoverable. And for 11.31 0803, they talked about maybe only aborting the current process.
Fayez
Trusted Contributor

Re: rx6600: EMS notifications

thanks for your replay,

I tried to run stm to exercise the cpu and memory, but it ended succesfully without any errors.

is there any way to check exactly what may cause this problem????
Dennis Handly
Acclaimed Contributor

Re: rx6600: EMS notifications

>is there any way to check exactly what may cause this problem????

Did you see this?
Analyze the dump & logs for cause. If necessary contact HP Support for assistance.

Unfortunately the fpl_em.htm#6772 link is broken and you have to search for 6772. It doesn't tell you any more than what you posted.

Was there a crash dump?
Michael Steele_2
Honored Contributor

Re: rx6600: EMS notifications

Hi:

Logs for hp-ux 11.31 MCA are kept in /var/tombstones. You can check here.

Also check /etc/shutdownlog and paste in any messages.

You can try logtool but this is mostly for peripherals and not CPU's. An MCA is for bad CPU's. Save this for a later problem.

cstm<<=EOF
runutil logtool
rs
EOF



The url provided by EMS is a restatement of what's already been provided, there is no update. Here's a MCA descrition, its a bad CPU.

Description
An MCA is a CPU interrupt that occurs when the CPU discovers that it can not continue reliable operation.
An MCA can result from either a hardware problem (such as an uncorrectable data error in memory or on a
system bus) or from a software error (typically, in a driver). In most cases when an MCA occurs, the system
stops normal processing and takes an OS memory dump if possible. The firmware also automatically logs
data that can be used by HP tools to analyze the cause of the MCA. On reboot, this data is read from firmware
and saved in â MCA logsâ .
Support Fatherhood - Stop Family Law
Fayez
Trusted Contributor

Re: rx6600: EMS notifications

Hi all,

Please find the attached logs. I did not found any thing useful there, but I will try to replace the cpu....
Michael Steele_2
Honored Contributor

Re: rx6600: EMS notifications

You're going to have to provide this panic string to HP for decoding. However, an MCA usually indicates a bad CPU. Can you call HP out for HW analysis? what's the hold up?
Support Fatherhood - Stop Family Law