1833016 Members
2678 Online
110048 Solutions
New Discussion

Re: Disc problems

 
SOLVED
Go to solution
Dan DeHaan
Advisor

Disc problems

I'm running a D380 with HPUX11.0.

About a month ago, we lost a drive for our database. The drive-light was on solid and the OS couldn't find it. Big problem but oh-well, drives fail. A test of the drive confirmed is was bad. BTW, we were going to mirror all the database drives in the next week - DOH!!!

Last Friday, the mirrored drive for the database went down. Wow, big coincidence! The OS had lost it. We unplugged it and plugged it back in and the OS found it and recovered the file-system and all. Testing found no problems on the drive.

Just today, the OS drive went down. Solid light on the drive. The message on the console was:
SYSTEM DIAGNOSTIC WARNING:
The diagnostic logging facility has started receiving excessive errors from the I/O subsytem. I/O error entries will be lost until the cause of the excessive I/O logging is corrected. ...
LVM: vg[0]: pvnum=0 (dev_t=0x1f005000) is POWERFAILED

Since I thought the drive may be toast anyhow, I unplugged it and plugged it back in. In about 3 seconds, the system was back and working.

My question is: could I have 3 bad drives (we replaced the first one - it DID die) or is the problem in the SCSI bus? How can I nail this down?
5 REPLIES 5
Patrick Wallek
Honored Contributor

Re: Disc problems

I had a similar problem on a D380 of mine last week. A non-mirrored drive died (drive light solid orange). I pulled it out, waited a bit, and plugged it back in. The system then saw the drive again BUT there were stale extents on the drive.

I attempted to mirror my data to another set of disks, and all LVOLs but one were successful. I replaced the drive and restored the data I couldn't mirror.

If these drives are original drives then they are a few years old. I would guess that you are just having a string of bad luck and your drives are starting to go.

I would go ahead and replace the other 2 drives that have shown problems. That is probably easier than trying to start testing the SCSI cards internal to the system.
A. Clay Stephenson
Acclaimed Contributor

Re: Disc problems

The LED "ON" solid, remove drive from slot, plug it back in, and it's OK syndrome is rather common on HP boxes and its indicative of a failing drive. I 've seen drives that would exhibit this behavior a number of times before failing completely. It could be a problem with other components on the bus but I am much more inclined to believe that you simply have a number of failing disks. If the box has been operated for any length of time at even moderately elevated temperatures, the drive failure rates begin to increase rather rapidly.

If the problem at any one times seems to "stay" on one disk then I would truly suspect the failing disk over any other possible component. If the problems seem to "move around" then I suspect things like a bad terminator (yes, they can fail) or a bad controller but yours is really classic failing disk(s).


If it ain't broke, I can fix that.
Dan DeHaan
Advisor

Re: Disc problems

Thanks guys. How do I check for stale extents on the drive?
Sridhar Bhaskarla
Honored Contributor
Solution

Re: Disc problems

Dan,

I would also check the "environment". Not too far ago, we had found "aluminum powder" due to a simple maintenance in the data center causing one of the superdomes to constantly keep crashing.

To check the stale extents, use lvdisplay command if your disks are under LVM control.

for LV in $(/usr/sbin/vgdisplay -v |grep "LV Name"|awk '{print $3}')
do
/usr/sbin/lvdisplay -v $LV |grep -q stale
if [ $? = 0 ]
then
echo $LV is stale
else
echo $LV is ok
fi
done


-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Mohanasundaram_1
Honored Contributor

Re: Disc problems

Hi ,

Before replacing the drives, consider updating the disk firmwares. I am sure the firmware on the disks are very old. Some of the disk firmwares fix such hanging problems.

You can also provide the model string of the disk so that I can tell you the latest firmware. diskinfo -v will indicate the firmware rev. and model string.

Of course, you would require an HP CE to perform this disk firmware upgrade.

Cheers,
Mohan
Attitude, Not aptitude, determines your altitude