Operating System - HP-UX
1834752 Members
2862 Online
110070 Solutions
New Discussion

how to detect hardware errors

 
Jeroen_D
Regular Advisor

how to detect hardware errors

Hi,

we recently had a disk failure in a D370.
After a reboot it appeared that a mirror set was corrupt. Conclusion was that both disks in a mirror set were defective.
We are wondering when the errors occurred. Did they start because of the reboot or were they dormant?
This brings me to this question :

Is there a way to detect hardware errors in a D-class server? There is no guardian processor in it I think.

I assume we can only check outputs of ioscan, vgdisplay and a dd to /dev/null.

But maybe you have other tips?

Thanks in advance,
Jeroen.
5 REPLIES 5
Cheryl Griffin
Honored Contributor

Re: how to detect hardware errors

Do you have diagnostics installed (STM/EMS/Predictive)?

As a note, Predictive has been replaced now by ISEE.

Were there any errors in dmesg or the syslog.log?
"Downtime is a Crime."
Franky_1
Respected Contributor

Re: how to detect hardware errors

Hi,

you can use cstm (if installed) or look at /var/tombstones/ts99 (most recent)
This should give you hints about any HW Failure

Regards

Franky
Don't worry be happy
doug mielke
Respected Contributor

Re: how to detect hardware errors

one of the first signs of a disk going bad may be in sar -d.

If the disk is having touble finding data (re reads?) you can see it in access times creeping up. You'd need a baseline (save a sar output before problems develop) to compare.
Mohanasundaram_1
Honored Contributor

Re: how to detect hardware errors

Hi Jeroen,

This type of failures would be logged into the syslog.log. If you want to get notified on such errors then you may have to write a script to capture key words like, SCSI,lbolt, error, warning etc.

EMS notification will also appear in syslog.log. You can also monitor the /var/opt/resmon/log/event.log through your script.

If you do not want to use scripts, then it is a good practice to check the syslog.log file everyday for such a problem.

In your case, if one of the mirror disk was bad and if you tried to boot with a single disk, then you have to give "-lq" at the ISL

ISL>hpux -lq

This will make the system activate vg00 without checking the quorum. You need to do this if the below command gives output as,

ISL>lsautofl
hpux

But if your autoboot string was

ISL>lsautofl
hpux -lq

then it should have booted on its own, provided the primary and alternate boot paths were set correctly.

Hope this helps.

Cheers,
Mohan.
Attitude, Not aptitude, determines your altitude
Jeroen_D
Regular Advisor

Re: how to detect hardware errors

Thanks for the answers.

Our customer knows enough now.

The quorum check etc was done and didn't work before. The hardware call was already solved, we just wanted to know some ways of detecting the hardware errors.

I guess the syslog.log and checking for certain keywords is the best way to go.
Actually a new server with a GSP is the best way...

Thanks all!