cancel
Showing results for 
Search instead for 
Did you mean: 

RP7410 HPMC errors

littleelvis
Occasional Visitor

RP7410 HPMC errors

We had an RP7410 crash yesterday. Server seems to be running fine after reboot. We have a tombstone file and was wondering if anyone could tell us what the problem might be. Any help would be appreciated. I have attached the tombstone file. I can post the logs from the MP as well.
3 REPLIES
Andrew Rutter
Honored Contributor

Re: RP7410 HPMC errors

hi,

Its difficult to read the tombstones, like that without the tools that hp have.

you/we may get a bit more useful info from the last MP error logs and the system probably generated a crashdump file in /var/adm/crash

If it has you may be able to run the crash tools against this file/dump

Andy
littleelvis
Occasional Visitor

Re: RP7410 HPMC errors

Thanks for the reply. I have attached the MP logs from the HPMC. It looks like Cell 1 had an issue first so that is suspect to me. Everything has been running fine for 2 days now. We don't have maint. on this box and have some extra parts so we would like to repair it ourselves if possible. Thanks for any help
Michael Steele_2
Honored Contributor

Re: RP7410 HPMC errors

Hi

The first thing you refer to in the ts99 file is the timestamp.

Timestamp = 07:00:10 GMT Oct 07 2009

Since you posted this on the 8th I'd say you had a legit HPMC.

The 2nd thing your looking for are messages like this:
CPU Information = 71d5d7Cell: 0 CPU: 0
CPU is present and not deconfigured.

Your CPUs are fine.

Next in line is the Memory
Memory Errorlog Information Cell 0x00
DNA MPD Block

Timestamp = 07:00:16 GMT Oct 07 2009

IPD C2C OV RQ RS ESTAT A C D corr unc fe cw ns acc
--- --- -- -- -- --------------- - - - ---- --- -- -- -- ---
X X ERR_ERROR X

And I'll need help with this. Perhaps another forum members knows if this is a legit DIMM problem with Cell 0?


248 PDC 0,1,2 *12 PDH_HWSM4_ALREADY_LOCKED 10/07/2009 06:07:47

Log Entry 248: 10/07/2009 06:07:47

Alert Level 12: Software failure; Keyword: PDH_HWSM4_ALREADY_LOCKED

Cell PDH firmware 0 evntDet 0; Status: 0

Logged by system firmware 0 during subActivity 9

Cell board location: cabinet 0 cell 1

0x200067c03f000090 0x00ffff01ffffff94

0x58006f0000000090 0x00006d090706072f

###############################

Well, its obvious you've had an HPMC and you'll have to contact HP for assitance. My best guess, DIMM in cell 0.

Note: You should be getting events logged by EMS in /etc/opt/resmon/logs/

Review the logs here and paste in the data.

Also, paste in /etc/shutdownlog and review syslog.log for events.
Support Fatherhood - Stop Family Law