HP 9000
cancel
Showing results for 
Search instead for 
Did you mean: 

N4000 - Unexpected HPMC

Dan Bolton
Frequent Advisor

N4000 - Unexpected HPMC

One of our test servers, an N4000-55, crashes and will not reboot after running Ok for several days. Power it down for a day or two and it boots up and runs for several more days. A forum search for "unexpected HPMC" seems to indicate a hardware problem, but I don't know where to go from here.

I have attached the console log of the HPMC message, taken through the GSP. The chassis logs didn't seem very informative to me, but I can post them if it would help.

Thanks,
Dan
...skid in sideways, chocolate in one hand, martini in the other, totally worn out and screaming, "WOO HOO what a ride!"
15 REPLIES
Sameer_Nirmal
Honored Contributor

Re: N4000 - Unexpected HPMC

A file called /var/tombstones/ts99 with valid timestamp in it will help.
Prashanth.D.S
Honored Contributor

Re: N4000 - Unexpected HPMC

Hi Dan,

As suggested above file called as ts99 under /var/tombstones will help us analyze the reason for this HPMC..

Can you attach this file here....

Best Regards,
Prashanth
Dan Bolton
Frequent Advisor

Re: N4000 - Unexpected HPMC

Hi,

Here is the ts99 file.
Thank you for any insight you can give.

-Dan
...skid in sideways, chocolate in one hand, martini in the other, totally worn out and screaming, "WOO HOO what a ride!"
Michael Steele_2
Honored Contributor

Re: N4000 - Unexpected HPMC

Hi

Well you definitely got a HW failure but I can't pin it down. I suggest you have HP look at it. I'm sure they'll spot it in a sec.

Good luck.
Support Fatherhood - Stop Family Law
Michael Steele_2
Honored Contributor

Re: N4000 - Unexpected HPMC

Hi Again:

Please send the following in attach.

cstm<<-EOF
runutil logtool
rs
EOF

ioscan -fnk

tail /etc/shutdownlog

Thanks!
Support Fatherhood - Stop Family Law
cnb
Honored Contributor

Re: N4000 - Unexpected HPMC

Hi Dan,



In addition, the PDC is too old and should be upgraded to have critical fixes installed at some point.



hth,
Dan Bolton
Frequent Advisor

Re: N4000 - Unexpected HPMC

Here is the output you requested.
...skid in sideways, chocolate in one hand, martini in the other, totally worn out and screaming, "WOO HOO what a ride!"
Michael Steele_2
Honored Contributor

Re: N4000 - Unexpected HPMC

10:02 Tue Jun 02 2009. Reboot after panic: , isr.ior = 0'10340007.0'b42c0f8
21:13 Mon Jun 08 2009. Reboot after panic: , isr.ior = 0'10340007.0'b42c0f8
22:19 Mon Jun 08 2009. Reboot after panic: , isr.ior = 0'4340003.0'338ef6b0
12:40 Tue Jun 09 2009. Reboot after panic: , isr.ior = 0'10340743.0'91e46148
22:21 Mon Jun 15 2009. Reboot after panic: , isr.ior = 0'10340003.0'6bdae918
23:28 Mon Jun 15 2009. Reboot after panic: , isr.ior = 0'340757.0'98a637b8
08:36 Wed Jun 17 2009. Reboot after panic: , isr.ior = 0'2401ec.0'26c9a178

You've had 7 panic's since June 2nd.
Support Fatherhood - Stop Family Law
Dan Bolton
Frequent Advisor

Re: N4000 - Unexpected HPMC

Yes, 7 panics (on three occasions) in June. The very first panic was one in December then it ran fine until April, when there were two more. The repeated panics started in May with 4 occurrences of repeating panics (16 panics in May).

As I mentioned, the server will be fine for a few days before failing. After crashing, it will try to reboot, but appears to crash again while booting or shortly thereafter. After several cycles (as many as 10 over one weekend) it hangs during a boot and requires a reset, TOC, or power cycle to restart.

This is a test box, and not currently under HW support, so if I cannot figure it out, we will have to call somebody in.

I assume the next step would be to strip it down to a minimal config (i/o cards, RAM, and CPUs) and gradually add components to find the faulty device. Had this been a 'hard' (or at least a predictable) failure I would have begun that process already, but as it may run for several days before failing, it could be winter by the time I isolate the problem... ;)

I am open to any suggestions.
...skid in sideways, chocolate in one hand, martini in the other, totally worn out and screaming, "WOO HOO what a ride!"
Michael Steele_2
Honored Contributor

Re: N4000 - Unexpected HPMC

Hi

Look in

/etc/opt/resmon/log/

There are five are so with daily event information. Some of the logs will be useless, but one for sure will have more information. Probably the *.html log. When you have something you'd like me to look at then post it.

Regarding stripping down the machine of I/O devices, this is not where your problem is occurring. If it were then it would show up in CSTM. You have something like a bad system board, cpu, DIMM, something associated to a High Priority Machine Check or HPMC. And I/O devices will usually go NO_HW before causing a panic like you're experiencing.

There should be events in your GSP logs as well. From the console, cntrol B, SL, look at the time stamps and alert levels. Post these when you have them.
Support Fatherhood - Stop Family Law
Dan Bolton
Frequent Advisor

Re: N4000 - Unexpected HPMC

Thanks for the info on I/O devices, that will save some time. When the problem started, I ran STM and it didn't show any problem devices, but I wasn't sure I could eliminate them.

Attached are the last 20 chassis codes form the Error, and Activity buffers. I have not had a chance to look at the resmon logs, but hope to yet today.

Thanks
...skid in sideways, chocolate in one hand, martini in the other, totally worn out and screaming, "WOO HOO what a ride!"
Dan Bolton
Frequent Advisor

Re: N4000 - Unexpected HPMC

I looked at the logs in /etc/opt/resmon/log/
and the last entry in reslog.html was in May 2008. The other 4 logs had timestamps within the last two days, but I didn't see much (except possibly some resmon configuration issues) in them. However, I have attached the last few entries of each, in case I am missing something.
...skid in sideways, chocolate in one hand, martini in the other, totally worn out and screaming, "WOO HOO what a ride!"
Dan Bolton
Frequent Advisor

Re: N4000 - Unexpected HPMC

I did find the attached entries in /etc/opt/resmon/log/api.log.old
...skid in sideways, chocolate in one hand, martini in the other, totally worn out and screaming, "WOO HOO what a ride!"
Michael Steele_2
Honored Contributor

Re: N4000 - Unexpected HPMC

Hi:
Question: What do you do with c3t7d0 and c2t7d0?

From your 6/9/2009 Resmon / api.logs
0/0/2/1.7.0
ctl 2 0/0/2/1.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c2t7d0
0/5/0/0.7.0
ctl 3 0/5/0/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c3t7d0

Check your /etc/passwd file for and rscsi entries like:

rscsi:x:1999:1000:Tape:/export/home/rscsi:/opt/schily/sbin/rscsi

This is a tape entry from another server for remote scsi over ip to the tape.

Re: GSP logs. I donâ t see any high GSP alert messages in your GSP logs, only one alert 7 and 19 alert 2â s which are all informational. If you look at the source of the alerts though theyâ re all system bus and local bus.

Looks like thereâ s nothing wrong with your 0/10 bus and Hitachi disk array, where I suppose most of your data is kept.

Wild guess, youâ ve got a bus issue, start with the remote scsi, which I know nothing about, and remove them is you donâ t need them.

What do you do with c3t7d0 and c2t7d0?

Else, I see nothing being recorded that is informational.

Please double check syslog.log for these times as well as ALL THE RESMON LOGS.

STM is fine.
Support Fatherhood - Stop Family Law
Dan Bolton
Frequent Advisor

Re: N4000 - Unexpected HPMC

Just an update on this...

In early Sept one of our Brocade Fibre-Channel switches began causing connectivity problems on a different server and was replaced. The N4000 server with the random crash/reboot problem was also connected to SAN disk thru this same FC switch.

Since the switch has been replaced the server has not crashed once! It has gone over 80 days without a crash.

As a comparison, during the Jan to Aug period it had crashed over 35 times, with about 15 of the reboots lasting less than 1 day. Average uptime was 9.2 days, and the longest was about 20 days.

I am calling this one fixed... Thanks to all for your input, and to you Michael for your time reviewing all the log files!

-Dan
...skid in sideways, chocolate in one hand, martini in the other, totally worn out and screaming, "WOO HOO what a ride!"