1748266 Members
3201 Online
108760 Solutions
New Discussion юеВ

Re: HPMC

 
SOLVED
Go to solution
Marius Pana_1
Regular Advisor

HPMC

Could anyone help me decipher the following? The system crashes but reboots succesfully. This is the 3rd time this has happened.

Thanks!

Log Entry 52: 24 Nov 2006 10:55:04
Alert Level 7: Fatal
Keyword: MC_HPMC_MONARCH_SELECTED
MC_HPMC_MONARCH_SELECTED
Logged by: System Firmware 0
Data: Implementation dependent data field
0xF680105E00E00660 FFFFFFF0F0C00000


Log Entry 50: 24 Nov 2006 10:55:04
Alert Level 7: Fatal
Keyword: ERR_CHECK_HPMC
An HPMC has been encountered.
Logged by: System Firmware 0
Data: Code address
0xE880035C00E00620 000000F0F0D7A12C


Log Entry 44: 24 Nov 2006 10:52:21
Alert Level 7: Fatal
Keyword: MC_HPMC_MONARCH_SELECTED
MC_HPMC_MONARCH_SELECTED
Logged by: System Firmware 0
Data: Implementation dependent data field
0xF680105E00E00570 FFFFFFF0F0C00000


Log Entry 42: 24 Nov 2006 10:52:21
Alert Level 7: Fatal
Keyword: ERR_CHECK_HPMC
An HPMC has been encountered.
Logged by: System Firmware 0
Data: Code address
0xE880035C00E00530 000000F0F0D7A12C


Log Entry 39: 24 Nov 2006 10:50:43
Alert Level 7: Fatal
Keyword: MC_HPMC_MONARCH_SELECTED
MC_HPMC_MONARCH_SELECTED
Logged by: System Firmware 0
Data: Implementation dependent data field
0xF680105E00E004E0 000000003E900000


Log Entry 37: 24 Nov 2006 10:50:43
Alert Level 7: Fatal
Keyword: ERR_CHECK_HPMC
An HPMC has been encountered.
Logged by: System Firmware 0
Data: Code address
0xE880035C00E004A0 000000003EA7A12C


Log Entry 35: 24 Nov 2006 10:50:30
Alert Level 7: Fatal
Keyword: MC_HPMC_MONARCH_SELECTED
MC_HPMC_MONARCH_SELECTED
Logged by: System Firmware 0
Data: Implementation dependent data field
0xF680105E00E00460 000000003E900000


Log Entry 33: 24 Nov 2006 10:50:30
Alert Level 7: Fatal
Keyword: ERR_CHECK_HPMC
An HPMC has been encountered.
Logged by: System Firmware 0
Data: Code address
0xE880035C00E00420 000000003EA7A12C
"The Linux philosophy is 'Laugh in the face of danger'. Oops. Wrong One. 'Do it yourself'. Yes, that's it." --Linus Torvalds
13 REPLIES 13
Sameer_Nirmal
Honored Contributor

Re: HPMC

The most probable cause of HPMC is h/w malfunctioning or failure but need not be the case always. HPMC can also occur on account of s/w.

You haven't mention the model of server , HPUX OS version and patch level of the sytem though.

The proper analysis of the system crash could be done by analyzing the crash dump in /var/adm/crash and looking at /var/tombstones/ts99 file.
You can also run STM logtool.

Those events shows the sequence of events occured and logged by various entities of the system on account of the HPMC.

You can place a h/w call to HP for early resolution.
Michael Steele_2
Honored Contributor
Solution

Re: HPMC

Hi Marius:
What you've listed here are Alert level 7's from your GSP or MP error logs. These are nothing to worry about since they don't exceed 11 or 12. You can verify this by 'control b' into your GSP or MP and redisplaying the error log.

Can you provide what's in your /etc/shutdownlog and what's in your /var/tombstones ts99, ts98 and ts97 files? Here is where you'll find HPMC's for PA-RISC servers, but not Itanium / Integrity servers. And then refer to /var/adm/syslog/OLD_Logs directory for past syslog.log files. These may provide the most information.


And if you could copy and paste and execute this script, a logtool HW report will generate. Please attach.

/usr/sbin/cstm<<-EOF
runutil logtool
rs
EOF

Support Fatherhood - Stop Family Law
Julian Hall
Occasional Advisor

Re: HPMC

A small correction to an earlier comment.

On the earlier PA machines events in the log were assigned alert levels from 0 to 15.

On the rp3400 and rp4440 systems, IPMI events are used to log activites, and these have alert levels 0 to 7, with alert level 7 being the most serious (fatal).

These events are something to worry about.
Michael Steele_2
Honored Contributor

Re: HPMC

Thanks Julian!
Support Fatherhood - Stop Family Law
Marius Pana_1
Regular Advisor

Re: HPMC

Here is the result of cstm log:

Summary of: /var/stm/logs/os/log1.raw.cur

Date/time of first entry: Tue Oct 17 18:52:46 2006

Date/time of last entry: Fri Nov 24 12:59:54 2006



Number of LPMC entries: 0
Number of System Overtemp entries: 0
Number of LVM entries: 0
Number of Logger Event entries: 0

Number of I/O Error entries: 36


Device paths for which entries exist:

(36) 0/2/1/0/4/0

Products for which entries exist:

(36) RAID Interface Controller

Product Qualifiers for which entries exist:

(36) SmartArray RAID

Logger Events for which entries exist:

(36) ciss
"The Linux philosophy is 'Laugh in the face of danger'. Oops. Wrong One. 'Do it yourself'. Yes, that's it." --Linus Torvalds
Michael Steele_2
Honored Contributor

Re: HPMC

Nothing in logtool Marius. What's in /etc/shutdownlog and OLDsyslog.logs? Anything else in /var/tombstones? Like mca files? Machine check Abort?
Support Fatherhood - Stop Family Law
Marius Pana_1
Regular Advisor

Re: HPMC

Sorry for the late response. I have a copy of shutdownlog but it does not say anything of use. I do not have access to the old shutdownlog. I also have attached ts9x file. ts97 is the one right after the crash. What can I use to analyze them and moreso where can I get some good documentation on this type of troubleshooting?

Thank you all.
"The Linux philosophy is 'Laugh in the face of danger'. Oops. Wrong One. 'Do it yourself'. Yes, that's it." --Linus Torvalds
Marius Pana_1
Regular Advisor

Re: HPMC

I forgot to mention it seems that there are 36 IO errors. Could these be the problem to my machine rebooting?

Summary of: /var/stm/logs/os/log1.raw.cur
Number of I/O Error entries: 36
Device paths for which entries exist:

(36) 0/2/1/0/4/0

Products for which entries exist:

(36) RAID Interface Controller

Product Qualifiers for which entries exist:

(36) SmartArray RAID

Logger Events for which entries exist:

(36) ciss
"The Linux philosophy is 'Laugh in the face of danger'. Oops. Wrong One. 'Do it yourself'. Yes, that's it." --Linus Torvalds
Michael Steele_2
Honored Contributor

Re: HPMC

There is no OLDshutdownlog. Its OLDsyslog.log.

ts99 is reporting PCI bus errors on October 24th. Shutdownlog is reporting a reboot after panic on 13:34 Mon Oct 23 2006.

Reboot after panic: , isr.ior = 0'ae27fffb.c0000000'4b92e048

An HPMC is usually related to a bad processor.

The point is this originally happended on October 23rd.

You have a HW problem, not a SW problem. Place a call to HP.
Support Fatherhood - Stop Family Law