1838703 Members
3079 Online
110128 Solutions
New Discussion

Re: ERR_ERROR in TS99

 
Michael Elleby III_1
Trusted Contributor

ERR_ERROR in TS99

Hello-

I am running HPUX 11 on an L2000, that has the FAULT led lit, and a review of the ts99 shows Memory Error Log information, ERR_ERROR in the ESTAT column..

Any ideas, or do you need more information?

Thanx.
Knowledge Is Power
5 REPLIES 5
S.K. Chan
Honored Contributor

Re: ERR_ERROR in TS99

The memory error code in tombstone files are difficult to interpret (usually they are not descriptive enough). Can you check a few things and post the output here.
# view /var/adm/syslog/syslog.log

==> Check for any unusual error message

# cstm
cstm> ru logtool
Logtool Utilities> vda

==> Run cstm and in cstm prompt enter "ru logtool" and once in Logtool Utilities prompt enter "vda". That will shw just the memroy error log.

# /etc/dmesg

==> anything unusual ?
S.K. Chan
Honored Contributor

Re: ERR_ERROR in TS99

Also run ..

# /usr/sbin/diag/contrib/pdcinfo

and post output ..
Michael Elleby III_1
Trusted Contributor

Re: ERR_ERROR in TS99

S.K., here's the deal:

DBA called me up telling me that FAULT LED was lit (this box is in another state), first thing I did was to log into GSP to look at error logs, the only thing I noted was an error with IO. Once I reviewed this log, of course, the lit led went away, but I still wanted to try and look at several things to note any correlations..

I looked at the TS99, because the DBA had originally thought that the system had panicked, which I noted that it didn't.. (which is why I made this post to get some insight since TS99 is not always clear)

What I found was that errors were being generated and alerted via EMS for two items:

1. There was a reset on the scsi bus that affected on of the hard drives in an SC10 unit attached to the server
2. There was a power failure on one of the power supplies but found out that the data center that this server sits in has had power issues.

I am currently monitoring the syslog, and do not note any adverse behavior besides the messages from EMS.

Thanx for the ideas, and I will keep them in mind should the error come up again..

Mike-
Knowledge Is Power
S.K. Chan
Honored Contributor

Re: ERR_ERROR in TS99

Sounds like you know what you're doing :)
I had a single disk failure on my SC10 last month which by the time it showed up in dmesg it was too late, it hung up the whole bus, including the 12H that I got daisy-chained to it. Before it appeared in dmesg I was getting at least 4-5 EMS alert over email and the worst thing is that each email alert shows different error condition on 3 different drives on the SC10. I did some checking then (even ran an STM exerciser) but no error was found, later that afternoon it showed up in syslog and dmesg (SCSI reset errors) and I was able to pinpoint which disk was bad (the green led light up on the slot which has the bad disk). I did not take the STM email alert seriously till it's too late...
Patrick Wessel
Honored Contributor

Re: ERR_ERROR in TS99

Michael,

All bus errors of internal system busses are generalized in the group ERR_ERROR. (System bus is not SCSI or fiber channel but the CPU and Memory bus). ERR_ERROR occurs when a function detects the assertion of the PATH_ERROR signal within the function???s bus error detection window.

You would need the whole ts99 file to understand the error
There is no good troubleshooting with bad data