System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

Diagnostic System Warning message

 
Andrew Kaplan
Super Advisor

Diagnostic System Warning message

Hi there --

We are running HP-UX 11.11 on two rp3440 servers in a ServiceGuard cluster. We recently did a graceful shutdown of the servers due to a battery replacement in the UPS that powers the cluster. A subsequent power up of the cluster occurred without incident.

Part of our daily checks is an inspection of the dmesg file on each server. This past Monday, the warning shown in the attached file appeared in the dmesg log of one of the servers. The error message has not appeared in the dmesg file of the other server.

The problem has occurred twice since the reboot of the cluster. I am not sure what components are cause for this error to occur or if it is a case of a failing device within the cluster.

Can someone provide some insight into this? Thanks.
A Journey In The Quest Of Knowledge
9 REPLIES
Mel Burslan
Honored Contributor

Re: Diagnostic System Warning message

Did you check the /var/opt/resmon/log/event.log file contents ? Check the lines starting with words "Notification Time" and try to match them a time just after the reboot and see what the first message says.

These messages are usually received when a disk logs a lot of scsi errors, which is indicative of a slowly dying disk, but it might be any hardware. So, your best bet is to check the log given above.

If you can not make sense of what is in the log, post the log contents and someone will be able to give you an answer.

________________________________
UNIX because I majored in cryptology...
Andrew Kaplan
Super Advisor

Re: Diagnostic System Warning message

Hi there --

Thanks for your reply. I checked the eventlog file that you suggested, and aside from some test events that occurred in recent days, the only critical event that occurred was back on March 18 of this year.
The event back then had to do with a fiber channel dead link notification.

I doubt that event has anything to do with the i/o errors mentioned in my initial posting, but as precaution, I am posting the eventlog. Thanks.
A Journey In The Quest Of Knowledge
Mel Burslan
Honored Contributor

Re: Diagnostic System Warning message

if the attached log file si all there is to it, I don't believe you have anything to worry about. But just for being on the safe side, scan the syslog for unusual activity. If there are scsi errors of some sort, they usually log vmunix messages into the syslog. Since the syslog starts from the time you reboot the server, if the dmesg lines came right after reboot, the error messages, if there are any, must be closer to the top.

Failing to find anything, I'd suggest you keep an eye on the event.log file. Like, use a cron job to check its size and if it grows, email a copy of it to yourself for reference purposes, if nothing else.
________________________________
UNIX because I majored in cryptology...
chris huys_4
Honored Contributor

Re: Diagnostic System Warning message

Hi andrew,

The diagnostic system warning, points to, disk related errors.

Check my 2 entries in this thread, http://h30499.www3.hp.com/t5/System-Administration/I-O-error-entries/m-p/4695491#M383605 , to find out, how to check the diskerrorlogs.

Greetz,
Chris

Andrew Kaplan
Super Advisor

Re: Diagnostic System Warning message

Hi there --

I went through the procedure recommended to create an ascii file from the disk log files. I have included it with this posting. I believe I know what the cause is for the errors, but I wanted a 'fresh' set of eyes to see the log file.

Thanks for the help in advance.
A Journey In The Quest Of Knowledge
Michael Steele_2
Honored Contributor

Re: Diagnostic System Warning message

cstm<<-EOF
runutil logtool
rs
EOF

Attach report and note the start and end date and any device with numbers in the hundreds
Support Fatherhood - Stop Family Law
Michael Steele_2
Honored Contributor

Re: Diagnostic System Warning message

AAAGH, you've already attached the logtool report. Unfortunately for you, it reports nothing.

If there had indeed been something then the following would be in hours, not days. This:

Date/time of first entry: Mon Feb 7
Date/time of last entry: Wed Apr 20

Would be more like this:

Date/time of first entry: Tue Apr 19
Date/time of last entry: Wed Apr 20

#############################

And the number of errors recorded per device, the largest number I can see (13) is nothing:

0/3/1/0/4/0.11.1.0.0.1.1 (13)
0/3/1/0/4/0.11.1.0.0.3.6 (2)

#############################

This would be something:

0/3/1/0/4/0.11.1.0.0.1.1 (866)
0/3/1/0/4/0.11.1.0.0.3.6 (1023)




Support Fatherhood - Stop Family Law
Andrew Kaplan
Super Advisor

Re: Diagnostic System Warning message

Hi there --

Sorry about jumping the gun...So are you saying this is much ado about nothing? If so, then why would the entries be present in the first place?

A Journey In The Quest Of Knowledge
chris huys_4
Honored Contributor

Re: Diagnostic System Warning message

Hi Andrew,

You like, the person in the thread I pointed to, have added the output of the formatted summary, and not like my last entry in that thread should point to, the output of the formatted log.

Anyway the formatted summary points, to problem with the eva3000 connected to this system, if I should guess. ;)

Greetz,
Chris