1827894 Members
1678 Online
109969 Solutions
New Discussion

Re: disk errors

 
SOLVED
Go to solution
Dagmar Boelen
Frequent Advisor

disk errors

Hi,

I would like to know how I can detect disk errors? Which logfiles should I check? What kind of messages?

Of course when a disk is unavailable (ioscan) it has failed. But how to detect a disk which stills seems to work but already produces errors.

Are the errors of an internal disks logged to a different location than the errors of a disk in a autoraid?

15 REPLIES 15
Martin Johnson
Honored Contributor

Re: disk errors

Most disk errors are report to the syslog (/var/adm/syslog/syslog.log). If you have predictive installed, you can get more information. The Support Tools Manager (STM, see man stm) will not only get you messages but will allow you to test disks.

HTH
Marty
John Poff
Honored Contributor

Re: disk errors

Hi,

I'd start with /var/adm/syslog. Most all of your disk errors will get logged there. You can also look at dmesg, but that is a ring buffer and not a file. Also your EMS notification will do pretty good too.

JP
Pete Randall
Outstanding Contributor

Re: disk errors

Disk errors should show up in dmesg output, possibly in /var/adm/syslog/syslog.log as well.


Pete


Pete
Sridhar Bhaskarla
Honored Contributor

Re: disk errors

Hi Dagmar,

Most of the errors should get logged into your dmesg and /var/adm/syslog/syslog.log. However, you will get more information if you use EMS that comes with Online Diagnostics. Look at the following document on configuring EMS.

http://docs.hp.com/hpux/onlinedocs/B7609-90022/B7609-90022.html

You can even get it configured to email the errors on various subsystems including disks.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Elena Leontieva
Esteemed Contributor

Re: disk errors

Dagmar,

1.You can check with xstm using verify utility;

2. Run the following command, any errors indicate a disk problem:

dd if=/dev/rdsk/device of=/dev/null bs=32K

Elena.
Eugeny Brychkov
Honored Contributor

Re: disk errors

There should be logs in the OS (like syslog.log) and storage device's logs (depends on device: for SCSI disks it's very poor - only defect list and some stats, for disk arrays - huge logs and many stats).
If looking to syslog.log usual 'bad' events are:
- power fails
- SCSI resets
- SCSI hangs
- SCSI timeouts
for disk arrays use special management software to take logs out of disk array
Eugeny
A. Clay Stephenson
Acclaimed Contributor

Re: disk errors

Look in /var/adm/syslog/syslog.dat. You will see the infamous "LBOLT" errors. You should be able to do a search on LBOLT and device and learn how to decode the device numbers. Occasional messages are expected.
If it ain't broke, I can fix that.
Pete Randall
Outstanding Contributor

Re: disk errors

To avoid potential loss of dmesg data because of the fact that it's a circular buffer, you can set up a cron job to periodically dump it off into a file as described in man dmesg:

"If the - argument is
specified, dmesg computes (incrementally) the new messages since the
last time it was run and places these on the standard output. This is
typically used with cron (see cron(1)) to produce the error log
/var/adm/messages by running the command:

/usr/sbin/dmesg - >> /var/adm/messages

every 10 minutes."


Pete


Pete
Bryan D. Quinn
Respected Contributor

Re: disk errors

Hello,

First and foremost, keep an eye on /var/adm/syslog/syslog.log
Outside of that I would suggest using stm.
If you have a particular disk that you think is whigging out, I would suggest doing an lvdisplay -v on that lvol and looking for stale extents. If you have a small environment, this might be a good practice to do occassionally on all of your lvols.
As for the autoraid units, I don't have a lot of experience with them. They should show up in syslog.log also, just like any other disk. I do beleive however, that some autoraid units do come with special tools for monitoring the array for problems. Like I said, though I don't have a lot of experience with that.

Hope this helps!
-Bryan
Dagmar Boelen
Frequent Advisor

Re: disk errors

Hi,

Lots of responses. You will get points for it of course!! Are disk errors in a AUTORAID also logged in the syslog? Someone mentioned using a tool for checking my autoraid. What kind of tool?
twang
Honored Contributor

Re: disk errors

To find out what disk errors occur, you can check output from "dmesg" or /var/adm/syslog/syslog.log

Once you find any problem, you use "dd" to check the problem disk, as follows:

# dd if=/dev/rdsk/c?t?d0 of=/dev/null bs=1024K

Bryan D. Quinn
Respected Contributor

Re: disk errors

Hello,

I mentioned some special tools, but I am not sure on that. I know we had a problem sometime ago with an archive server that had an autoraid unit. I remember something about some autoraid software that was installed on the box, but I could not tell you what it did. I just had the disk swapped out and continued on my way. Since then I have moved that server over to our EMC frame. I vaguely remember the HP Response Center engineer having me perform some tasks with what I remember to be some sort of autoraid software. I will check back on that server and see if I can elaborate on what I am referring too. I know the autoraid unit was pretty old. I will see what I can dig up.

-Bryan
Sridhar Bhaskarla
Honored Contributor

Re: disk errors

Hi Dagmar,

As I mentioned before, EMS should log quite a variety of errors. There are specific configuration files for arrays like FC60 etc., One good example that the system cannot catch but EMS does is "battery failures" on the FC60.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Bryan D. Quinn
Respected Contributor
Solution

Re: disk errors

Hello Dagmar,

I powered up that server and checked it out. Apparently what I was thinking of was the Disk Array Manager and the Disk Array Monitor daemon, which are processes that start up at boot time. I looked and under /opt/hparray/bin, I beleive these were some of the commands that the HP engineer had me run. I think to re-configure the array. I don't think we used these commands to diagnose the situation, just to re-config the array after the disk was replaced. I feel certain that EMS popped a message in syslog and then we used the /opt/hparray/bin commands to get things straight in the array after the disk was replaced.

-Bryan
Bryan D. Quinn
Respected Contributor

Re: disk errors

Hello Dagmar,

Please no points for this or my last message. I was just following up.

-Bryan