HPE 9000 and HPE e3000 Servers
1753717 Members
4826 Online
108799 Solutions
New Discussion юеВ

Re: N4000 - Unexpected HPMC

 
Michael Steele_2
Honored Contributor

Re: N4000 - Unexpected HPMC

Hi

Look in

/etc/opt/resmon/log/

There are five are so with daily event information. Some of the logs will be useless, but one for sure will have more information. Probably the *.html log. When you have something you'd like me to look at then post it.

Regarding stripping down the machine of I/O devices, this is not where your problem is occurring. If it were then it would show up in CSTM. You have something like a bad system board, cpu, DIMM, something associated to a High Priority Machine Check or HPMC. And I/O devices will usually go NO_HW before causing a panic like you're experiencing.

There should be events in your GSP logs as well. From the console, cntrol B, SL, look at the time stamps and alert levels. Post these when you have them.
Support Fatherhood - Stop Family Law
Dan Bolton
Frequent Advisor

Re: N4000 - Unexpected HPMC

Thanks for the info on I/O devices, that will save some time. When the problem started, I ran STM and it didn't show any problem devices, but I wasn't sure I could eliminate them.

Attached are the last 20 chassis codes form the Error, and Activity buffers. I have not had a chance to look at the resmon logs, but hope to yet today.

Thanks
...skid in sideways, chocolate in one hand, martini in the other, totally worn out and screaming, "WOO HOO what a ride!"
Dan Bolton
Frequent Advisor

Re: N4000 - Unexpected HPMC

I looked at the logs in /etc/opt/resmon/log/
and the last entry in reslog.html was in May 2008. The other 4 logs had timestamps within the last two days, but I didn't see much (except possibly some resmon configuration issues) in them. However, I have attached the last few entries of each, in case I am missing something.
...skid in sideways, chocolate in one hand, martini in the other, totally worn out and screaming, "WOO HOO what a ride!"
Dan Bolton
Frequent Advisor

Re: N4000 - Unexpected HPMC

I did find the attached entries in /etc/opt/resmon/log/api.log.old
...skid in sideways, chocolate in one hand, martini in the other, totally worn out and screaming, "WOO HOO what a ride!"
Michael Steele_2
Honored Contributor

Re: N4000 - Unexpected HPMC

Hi:
Question: What do you do with c3t7d0 and c2t7d0?

From your 6/9/2009 Resmon / api.logs
0/0/2/1.7.0
ctl 2 0/0/2/1.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c2t7d0
0/5/0/0.7.0
ctl 3 0/5/0/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c3t7d0

Check your /etc/passwd file for and rscsi entries like:

rscsi:x:1999:1000:Tape:/export/home/rscsi:/opt/schily/sbin/rscsi

This is a tape entry from another server for remote scsi over ip to the tape.

Re: GSP logs. I don├в t see any high GSP alert messages in your GSP logs, only one alert 7 and 19 alert 2├в s which are all informational. If you look at the source of the alerts though they├в re all system bus and local bus.

Looks like there├в s nothing wrong with your 0/10 bus and Hitachi disk array, where I suppose most of your data is kept.

Wild guess, you├в ve got a bus issue, start with the remote scsi, which I know nothing about, and remove them is you don├в t need them.

What do you do with c3t7d0 and c2t7d0?

Else, I see nothing being recorded that is informational.

Please double check syslog.log for these times as well as ALL THE RESMON LOGS.

STM is fine.
Support Fatherhood - Stop Family Law
Dan Bolton
Frequent Advisor

Re: N4000 - Unexpected HPMC

Just an update on this...

In early Sept one of our Brocade Fibre-Channel switches began causing connectivity problems on a different server and was replaced. The N4000 server with the random crash/reboot problem was also connected to SAN disk thru this same FC switch.

Since the switch has been replaced the server has not crashed once! It has gone over 80 days without a crash.

As a comparison, during the Jan to Aug period it had crashed over 35 times, with about 15 of the reboots lasting less than 1 day. Average uptime was 9.2 days, and the longest was about 20 days.

I am calling this one fixed... Thanks to all for your input, and to you Michael for your time reviewing all the log files!

-Dan
...skid in sideways, chocolate in one hand, martini in the other, totally worn out and screaming, "WOO HOO what a ride!"