Simpler Navigation for Servers and Operating Systems
Completed: a much simpler Servers and Operating Systems section of the Community. We combined many of the older boards, so you won't have to click through so many levels to get at the information you need. Check the consolidated boards here as many sub-forums are now single boards.
HP 9000 and HP e3000 Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

RP7410 HPMC errors

littleelvis
Occasional Visitor

RP7410 HPMC errors

We had an RP7410 crash yesterday. Server seems to be running fine after reboot. We have a tombstone file and was wondering if anyone could tell us what the problem might be. Any help would be appreciated. I have attached the tombstone file. I can post the logs from the MP as well.
3 REPLIES
Andrew Rutter
Honored Contributor

Re: RP7410 HPMC errors

hi,

Its difficult to read the tombstones, like that without the tools that hp have.

you/we may get a bit more useful info from the last MP error logs and the system probably generated a crashdump file in /var/adm/crash

If it has you may be able to run the crash tools against this file/dump

Andy
littleelvis
Occasional Visitor

Re: RP7410 HPMC errors

Thanks for the reply. I have attached the MP logs from the HPMC. It looks like Cell 1 had an issue first so that is suspect to me. Everything has been running fine for 2 days now. We don't have maint. on this box and have some extra parts so we would like to repair it ourselves if possible. Thanks for any help
Michael Steele_2
Honored Contributor

Re: RP7410 HPMC errors

Hi

The first thing you refer to in the ts99 file is the timestamp.

Timestamp = 07:00:10 GMT Oct 07 2009

Since you posted this on the 8th I'd say you had a legit HPMC.

The 2nd thing your looking for are messages like this:
CPU Information = 71d5d7Cell: 0 CPU: 0
CPU is present and not deconfigured.

Your CPUs are fine.

Next in line is the Memory
Memory Errorlog Information Cell 0x00
DNA MPD Block

Timestamp = 07:00:16 GMT Oct 07 2009

IPD C2C OV RQ RS ESTAT A C D corr unc fe cw ns acc
--- --- -- -- -- --------------- - - - ---- --- -- -- -- ---
X X ERR_ERROR X

And I'll need help with this. Perhaps another forum members knows if this is a legit DIMM problem with Cell 0?


248 PDC 0,1,2 *12 PDH_HWSM4_ALREADY_LOCKED 10/07/2009 06:07:47

Log Entry 248: 10/07/2009 06:07:47

Alert Level 12: Software failure; Keyword: PDH_HWSM4_ALREADY_LOCKED

Cell PDH firmware 0 evntDet 0; Status: 0

Logged by system firmware 0 during subActivity 9

Cell board location: cabinet 0 cell 1

0x200067c03f000090 0x00ffff01ffffff94

0x58006f0000000090 0x00006d090706072f

###############################

Well, its obvious you've had an HPMC and you'll have to contact HP for assitance. My best guess, DIMM in cell 0.

Note: You should be getting events logged by EMS in /etc/opt/resmon/logs/

Review the logs here and paste in the data.

Also, paste in /etc/shutdownlog and review syslog.log for events.
Support Fatherhood - Stop Family Law