cancel
Showing results for 
Search instead for 
Did you mean: 

Failed Hard Disk?

 
Matthew Sti
Occasional Visitor

Failed Hard Disk?

I am new to UNIX administration since our existing UNIX admin is out on medical leave. I am not exactly sure what the model is on the computer but I have been gathering some information. I have replaced hard drives on some SUN workstations but not on HP UX.

 

# uname -a

HP-UX hploaner B.11.00 E 9000/785 2006805075 8-user license

 

 

When I run a dmesg command I am given errors. However I am not sure if the hard drives are in a RAID configuration or not. I am not sure which drive is bad. Tomorrow I will be able to gather more information. My sessions keep freezing so I keep getting stuck running certain commands. Below are some commands I ran to gather information. Can someone please suggest what actions to take? Thanks.

 

# dmesg

SCSI: unrecoverred deferred error (dev = 0x1f035000, lba = 0x1726b7)

SCSI: Async write error -- dev: b 31 0x035000, errno: 5, resid: 8192,
        blkno: 5704, sectno: 11408, offset: 5840896, bcount: 8192.
vxfs: mesg 055: vx_metaioerr - /dev/vg01/lvol1 file system meta data write error
LVM: VG 64 0x010000: Lost quorum.
This may block configuration changes and I/Os. In order to reestablish quorum at least 1 of the following PVs (represented by current link) must become available:
<31 0x035000>
LVM: VG 64 0x010000: PVLink 31 0x035000 Failed! The PV is not accessible.
vxfs: mesg 055: vx_metaioerr - /dev/vg01/lvol2 file system meta data read error
vxfs: mesg 008: vx_direrr - /disk1 file system inode 2 block 40605 error 5
vxfs: mesg 008: vx_direrr - /disk1 file system inode 2 block 40605 error 5
vxfs: mesg 016: vx_ilisterr - /disk2 file system error reading inode 116493
vxfs: mesg 008: vx_direrr - /disk1 file system inode 2 block 40605 error 5
vxfs: mesg 004: vx_mapbad - /disk2 file system free inode bitmap in au 0 marked bad
vxfs: mesg 008: vx_direrr - /disk1 file system inode 2 block 40605 error 5
vxfs: mesg 008: vx_direrr - /disk1 file system inode 2 block 40605 error 5
vxfs: mesg 008: vx_direrr - /disk1 file system inode 2 block 40605 error 5

 

# more /etc/fstab

/dev/vg00/lvol3 / vxfs delaylog 0 1
/dev/vg00/lvol1 /stand hfs defaults 0 1
/dev/vg00/lvol4 /opt vxfs delaylog 0 2
/dev/vg00/lvol5 /tmp vxfs delaylog 0 2
/dev/vg00/lvol6 /usr vxfs delaylog 0 2
/dev/vg00/lvol7 /var vxfs delaylog 0 2
/dev/vg00/lvol9 ... swap pri=0 0 0
#/dev/vg01/lvol3 /cs vxfs rw,suid,nolargefiles,delaylog,datainlog 0 2
/dev/vg01/lvol1 /disk2 vxfs rw,suid,nolargefiles,delaylog,datainlog 0 2
/dev/vg01/lvol2 /disk1 vxfs rw,suid,nolargefiles,delaylog,datainlog 0 2

 

# vgdisplay -v vg01
--- Volume groups ---
VG Name                     /dev/vg01
VG Write Access             read/write
VG Status                   available
Max LV                      255
Cur LV                      3
Open LV                     3
Max PV                      16
Cur PV                      1
Act PV                      1
Max PE per PV               17502
VGDA                        2
PE Size (Mbytes)            4
Total PE                    17499
Alloc PE                    17498
Free PE                     1
Total PVG                   1
Total Spare PVs             0
Total Spare PVs in use      0

   --- Logical volumes ---
   LV Name                     /dev/vg01/lvol1
   LV Status                   available/syncd
   LV Size (Mbytes)            17000
   Current LE                  4250
   Allocated PE                4250
   Used PV                     1

   LV Name                     /dev/vg01/lvol2
   LV Status                   available/syncd
   LV Size (Mbytes)            17000
   Current LE                  4250
   Allocated PE                4250
   Used PV                     1

   LV Name                     /dev/vg01/lvol3
   LV Status                   available/syncd
   LV Size (Mbytes)            35992
   Current LE                  8998
   Allocated PE                8998
   Used PV                     1


   --- Physical volumes ---
   PV Name                     /dev/dsk/c3t5d0
   PV Status                   unavailable
   Total PE                    17499
   Free PE                     1
   Autoswitch                  On


   --- Physical volume groups ---
   PVG Name                    vg01
   PV Name                     /dev/dsk/c3t5d0

4 REPLIES
Patrick Wallek
Honored Contributor

Re: Failed Hard Disk?

It appears that you do have a bad disk at /dev/dsk/c3t5d0.  One of the keys is this part of the output:

 

   PV Name                     /dev/dsk/c3t5d0
   PV Status                   unavailable

 

Unavailable is NOT a good status.

 

Also, your logical volumes are not mirrored as they are using only 1 physical volume (note the "USED PV" lines in your vgdisplay output), and you have only 1 physical volume in your VG01 volume group.

 

The disk appears to be a 70 GB disk.

 

The steps you will likely have to take:

 

1) Shutdown the computer (which appears to be a workstation class from the "9000/785" in the model output).

 

2) Figure out how to open up the case and replace the disk.

 

3) Power the computer back on. (You will likely see errors during startup since VG01 will not be available for mounting since there's a new disk)

 

4) When you can get logged in again, run the following commands:

 

# ioscan -fnC disk

Look for the /dev/dsk/c3t5d0 disk and make sure it shows to be CLAIMED.

 

# vgcfgrestore -n vg01 /dev/dsk/c3t5d0

 

If that is successful, then

 

# vgchange -a y vg01

 

# mount -a

This will mount anything in /etc/fstab that isn't already mounted, which should be your VG01 logical volumes.

 

# bdf

Verify that everything is mounted. You should see /disk1 and /disk2 in addition to the usual OS logical volumes.

 

5) Now you have to restore your data.

 

Good luck!

 

 

Matthew Sti
Occasional Visitor

Re: Failed Hard Disk?

Thanks for the reply. I filed the post away in my UNIX admin notes. At first I was confused on how to tell which drive had what on it. Once I looked at the information I had gathered it started making sense. On the plus side I may have gotten luck. The box is used for developmental work and I think our customer may have dropped support for it so I hopefully won't have to restore it.

 

Once I got to look at the system I noticed the root drive was making a very loud whining noise so I hope that is not going to be an issue. If that drive is going bad what would be the best procedure for replacing it? Also how could you tell that the drive was 70GB? Did you multiple 17499 x 4?

 

On the Solaris box I restored I hooked a drive up to another host, create the partitions/slices, and restored the data there. Afterwards I mounted it back into the original system and made it bootable and it worked.

Patrick Wallek
Honored Contributor

Re: Failed Hard Disk?

>>...I think our customer may have dropped support for it so I hopefully won't have to restore it.

 

That would make life a whole lot easier.

 

>>Once I got to look at the system I noticed the root drive was making a very loud whining noise...

 

Hmmm... That's not necessarily a good thing.  It sounds like the root drive may also be on its last leg.

 

>>If that drive is going bad what would be the best procedure for replacing it?

 

Do you h ave a tape drive attached to this server?  If so, see if you have the Ignite/UX software installed.  If so, you may be able to create a make_recover or make_tape_recovery tape which would let you rebuild the root disk if/when it fails.

 

To check for Ignite, see if you have the make_recovery or make_tape_recovery program.

 

# ls -l /opt/ignite/bin/make*

 

>>Also how could you tell that the drive was 70GB? Did you multiple 17499 x 4?

 

Yes, exactly.  That's just a rough guess, but should get you close.  

 

The output of 'ioscan -kfnC disk' or 'diskinfo /dev/rdsk/c3t5d0' can tell you more accurately what type of disk and its size.

 

>>On the Solaris box I restored...

 

Unfortunately not quite so easy on HP-UX.  The use of LVM could cause issues if you tried doing something like that here.

Matthew Sti
Occasional Visitor

Re: Failed Hard Disk?

If the customer does not support the OS then I will be decommissioning that system.

 

We use EMC Networker for backing up UNIX systems. Since we have so many old systems we have to keep around for customers we are still on version 7.3 which is about 4 major releases back from the most current. I was just wondering if there was something I could do before hand like how solaris has ufsdump/ufsrestore or if I could boot into a single usermode and copy the files over with cpio or something similar.

 

Thanks again.