HPE 9000 and HPE e3000 Servers
1748176 Members
4093 Online
108758 Solutions
New Discussion юеВ

Re: Weird messages in the GSP chassis log

 
Daniel Robert
Frequent Advisor

Weird messages in the GSP chassis log

On a server that we hardly use (A500), I noticed that the attention light was flashing so I checked the chassis codes and found two new entries which I cannot figure out what it means (see below). If I check the syslog.log I don't see anything in there to clue me in as to what this is referring to. Here are the two entries:
Log Entry # 0 :
SYSTEM NAME: robin-con
DATE: 12/22/2010 TIME: 08:18:31
ALERT LEVEL: 2 = Non-Urgent operator attention required

SOURCE: 0 = unknown, no source stated
SOURCE DETAIL: 0 = unknown, no source stated SOURCE ID: FF
PROBLEM DETAIL: 0 = no problem detail

CALLER ACTIVITY: 6 = machine check STATUS: 2
CALLER SUBACTIVITY: 74 = implementation dependent
REPORTING ENTITY TYPE: 0 = system firmware REPORTING ENTITY ID: 01

0x0000102000FF6742 00000000 00000000 type 0 = Data Field Unused
0x5800182000FF6742 00006E0B 1608121F type 11 = Timestamp 12/22/2010 08:18:31

Log Entry # 0 :
SYSTEM NAME: robin-con
DATE: 12/22/2010 TIME: 08:18:31
ALERT LEVEL: 2 = Non-Urgent operator attention required

SOURCE: 0 = unknown, no source stated
SOURCE DETAIL: 0 = unknown, no source stated SOURCE ID: FF
PROBLEM DETAIL: 0 = no problem detail

CALLER ACTIVITY: 6 = machine check STATUS: 2
CALLER SUBACTIVITY: 74 = implementation dependent
REPORTING ENTITY TYPE: 0 = system firmware REPORTING ENTITY ID: 01

0x0000102000FF6742 00000000 00000000 type 0 = Data Field Unused
0x5800182000FF6742 00006E0B 1608121F type 11 = Timestamp 12/22/2010 08:18:31
Type CR for next entry, Q CR to quit.


I searched for the chassis code "0x0000102000FF6742" on ITRC and found nothing. Would anyone have any idea as to what this is referring to? Ever since those two incidences before Christmas, we didn't get any more.
14 REPLIES 14
cnb
Honored Contributor

Re: Weird messages in the GSP chassis log

Hi Daniel,

Did it crash?

Anything in /var/tombstones around the same time frame?

Go into the GSP Error log and look at all of the messages prior to those 2. Then look at the Activity or Incoming logs to see if anything else is reported around the same time as those messages. Post anything you see.

Check /var/opt/resmon/log/event.log

Use STM to view the event log:

# cstm
cstm> ru logtool

check for memory or other errors.

Rgds,
Daniel Robert
Frequent Advisor

Re: Weird messages in the GSP chassis log

Hi cnb,

It did not crash since uptime indicated that it has been up for 266 days.

There is nothing in the tombstone directories newer then files dated March 2010.

The last thing that the events.log had was a failed BBU in our RAID back in November 2010.

I am not sure what to do after entering the "ru logtool" command in cstm (I have never done this before).

I have exercised the Memory and CPU via stm and it passed the tests.

I noticed that the chassis code is different on the last GSP log line (0x5800182000FF6742) so did some more research and found http://h30499.www3.hp.com/t5/HP-9000/rp5470-GSP-infomation/m-p/4656059#M32857. It refers to possible intermittent and self-corrected faulty memory. This may be the case with us, but stm found it good during the test. I guess I can do longer tests and see if that's the case.

Is that what you are suspecting?

cnb
Honored Contributor

Re: Weird messages in the GSP chassis log

Yes, go into the cstm logtool and type "vd" to see what if any memory errors are being logged. Also look for low-level errors.

Here's a tutorial on logtool:
http://docs.hp.com/en/diag/logtool/lgt_summ.htm

Logtool Guide:
http://docs.hp.com/en/1098/log_spec.pdf


Rgds,
Daniel Robert
Frequent Advisor

Re: Weird messages in the GSP chassis log

I verified the log in cstm via the vd command and it only showed memory errors (8 of them) dating back to March 10, 2010 all from DIMM Slot 1. This may be a clue as to what the problem might be (and I remember having that problem, and all I did was reseat the DIMMs and the problem went away). It strange that there is no indication that this is a memeory problem this time.

BTW, I forgot to mention that the third entry in the GSP log was dated back in March 2010.
cnb
Honored Contributor

Re: Weird messages in the GSP chassis log

What about the GSP Activity and Incoming logs? What do they show?


Rgds,

Daniel Robert
Frequent Advisor

Re: Weird messages in the GSP chassis log

Hi cnb,

I'm sorry, but I don't know what you are referring to.
Daniel Robert
Frequent Advisor

Re: Weird messages in the GSP chassis log

Ooops, sorry I read your reply too quickly (I missed the GSP part)... I will check that and let you know.
cnb
Honored Contributor

Re: Weird messages in the GSP chassis log

Hi,

GenericSysName [HP Release B.11.11] (see /etc/issue)
Console Login:

Leaving Console Mode - you may lose write access.
When Console Mode returns, type ^Ecf to get console write access.

GSP Host Name: A500-GSP
GSP> sl


SL

Select Chassis Code Buffer to be displayed:
Incoming, Activity, Error, Current boot or Last boot? (I/A/E/C/L)

Select I and A and look for activity around the same time as the Error events you saw. Compare these and post any results.

Rgds,

Daniel Robert
Frequent Advisor

Re: Weird messages in the GSP chassis log

Hi cnb,

I reviewed the Incoming log and found the following message repeating every 10 minutes and AFAIK it's just a heartbeat message. Because it's at every 10 minutes, the log does not get as far as Dec. 22.
--------------------------------------------
Log Entry # 0 :
SYSTEM NAME: robin-con
DATE: 01/18/2011 TIME: 18:46:04
ALERT LEVEL: 0 = No failure detected, forward progress

SOURCE: 1 = processor
SOURCE DETAIL: 1 = processor general SOURCE ID: 0
PROBLEM DETAIL: 0 = no problem detail

CALLER ACTIVITY: F = display_activity() update STATUS: 0
CALLER SUBACTIVITY: 00 = implementation dependent
REPORTING ENTITY TYPE: E = HP-UX REPORTING ENTITY ID: 00

0x58E008001100F000 00006F00 12122E04 type 11 = Timestamp 01/18/2011 18:46:04
--------------------------------------------

Here are the entries in the Activity log:
--------------------------------------------
Log Entry # 0 :
SYSTEM NAME: robin-con
DATE: 01/01/1970 TIME: 00:00:00
ALERT LEVEL: 1 = Information only, no action required

SOURCE: 6 = platform
SOURCE DETAIL: 6 = service processor SOURCE ID: 0
PROBLEM DETAIL: 1 = selftest result

CALLER ACTIVITY: 2 = operation STATUS: 0
CALLER SUBACTIVITY: 03 = console
REPORTING ENTITY TYPE: 1 = service processor REPORTING ENTITY ID: 00

0xF010011166002030 00000000 00000006 type 30 = reset type and cause
0x5810091166002030 00004600 01000000 type 11 = Timestamp 01/01/1970 00:00:00


Log Entry # 1 :
SYSTEM NAME: robin-con
DATE: 12/22/2010 TIME: 08:18:31
ALERT LEVEL: 2 = Non-Urgent operator attention required

SOURCE: 0 = unknown, no source stated
SOURCE DETAIL: 0 = unknown, no source stated SOURCE ID: FF
PROBLEM DETAIL: 0 = no problem detail

CALLER ACTIVITY: 6 = machine check STATUS: 2
CALLER SUBACTIVITY: 74 = implementation dependent
REPORTING ENTITY TYPE: 0 = system firmware REPORTING ENTITY ID: 01

0x0000102000FF6742 00000000 00000000 type 0 = Data Field Unused
0x5800182000FF6742 00006E0B 1608121F type 11 = Timestamp 12/22/2010 08:18:31


Log Entry # 2 :
SYSTEM NAME: robin-con
DATE: 12/22/2010 TIME: 08:18:30
ALERT LEVEL: 2 = Non-Urgent operator attention required

SOURCE: 0 = unknown, no source stated
SOURCE DETAIL: 0 = unknown, no source stated SOURCE ID: FF
PROBLEM DETAIL: 0 = no problem detail

CALLER ACTIVITY: 6 = machine check STATUS: 2
CALLER SUBACTIVITY: 72 = implementation dependent
REPORTING ENTITY TYPE: 0 = system firmware REPORTING ENTITY ID: 01

0x0000102000FF6722 00000000 00000000 type 0 = Data Field Unused
0x5800182000FF6722 00006E0B 1608121E type 11 = Timestamp 12/22/2010 08:18:30
--------------------------------------------

As you can see the Activity log seems to be reporting the same as the Error log except for the most recent one which has an invalid date (and time?) which I seem to recall that a certain circumstance causes GSP to not know the current date and time (while rebooting, if I recall?).