1837249 Members
5132 Online
110115 Solutions
New Discussion

Re: Hard disk failure

 
Vikas_2
Advisor

Hard disk failure

One of the hard disk of my 10.20 system has failed. Ioscan shows not claimed for the disk. This disk is in a external storage and is hot swappable. The disk is not mirrored and the VG has three disks in it.

Can you pleae advice me the steps to carry out before and after the disk change?

Thanks in advance
-Vikas
8 REPLIES 8
Stefan Farrelly
Honored Contributor

Re: Hard disk failure

You firstly need to identify which lvols are on the failed drive - because youve lost the data on them. Try lvdisplay -v on each lvol and see which report errors. Once you replace the failed drive do a vgcfgrestore onto it, this will replace the VG/LV info, but now you will need to newfs the lvols which were on it as their either corrupted or completely lost, then recover the data on them from backup.
Im from Palmerston North, New Zealand, but somehow ended up in London...
Robert-Jan Goossens
Honored Contributor

Re: Hard disk failure

Hi Vikas,

http://www.unixadm.net/howto/bad_disk.html

Start from point 1.2 (you have hot swappable disk).

Regards,

Robert-Jan
Vikas_2
Advisor

Re: Hard disk failure

Robert I'm not able to acces the link.

Patrick, it says that I need to reboot the server. Is it not necessary? the failed disk does not belong to root vg in my case and it had only one filesystem/LV on it.

Do I need to do pvcreate or the vgcfgrestore will take care of it?

My plan after the disk replacment is:
pvcreate on the disk
vgcfgrestore on the vg
lvcreate
newfs
mount the fs and restore

Does it look logical or I need to do somthing else?

Thanks a lot for your help

-Vikas
Robert-Jan Goossens
Honored Contributor

Re: Hard disk failure

Have the engineer replace the faulty disk, no need to reboot, you have hot swap disks.

[Step 1.2]

Restore the LVM configuration/headers onto the new disk from your backup
of the LVM configuration:

# vgcfgrestore -n [volume group name] /dev/rdsk/cXtYdZ

Where X is the 'card instance number' of the SCSI bus attached to
that card. Y is the 'SCSI ID' of the disk (or array controller, in
the case of an array), and Z is the 'LUN number' (typically 0 for a
non-array type disk). Note that if the HP Customer Engineer replaces
the disk at the same address, the device file name will not change.
In that case the name will be what it was prior to the replacement.
For our example:

# vgcfgrestore -n /dev/vg00 /dev/rdsk/c0t4d0

[Step 1.3]

Reactivate the volume group (VG) so that the new disk can be attached,
since it wasn't configured in at boot time:

# vgchange -a y [volume group name]

For our example, the volume group vg00 will already be activated, but
it will not know of the replaced disk; therefore, this step is still
required so that LVM will now know that the disk is again available:

# vgchange -a y /dev/vg00

The vgchange command will activate each specified volume group and all
associated physical and logical volumes for read-write access. In the case
of
vg00,it would initially have been activated with c0t4d0 in an unknown state.
vgchange tells vg00 to look again at c0t4d0, which is now in a known state.
It
is important to remember that even though lvol5 and lvol6 are now active
they
are void of data.

[Step 1.4]

Determine which logical volumes spanned onto that disk. You only need
to recreate and restore data for the volumes that actually touched that
disk. Other LVs in the volume group are still OK.

# pvdisplay -v /dev/dsk/c0tXd0

will show a listing of all the extents on disk lu X, and to what logical
volume they belong. This listing is fairly long, so you might want to
pipe it to more or send it to a file. For our example:

# pvdisplay -v /dev/dsk/c0t4d0 | more
.....
.....
--- Distribution of physical volume ---
LV Name LE of LV PE for LV
/dev/vg00/lvol5 50 50
/dev/vg00/lvol6 245 245
.....
.....

From this we can see that logical volumes /dev/vg00/lvol5 and
/dev/vg00/lvol6 have physical extents on this disk, but /dev/vg00/lvol1
through /dev/vg00/lvol4 don't, so we will need to recreate and restore
lvol5 and lvol6 only.

Note: Even though lvol5 was also in part on another disk drive, it
will need to be treated as if the entire lvol was lost, not just
the part on c0t4d0.

[Step 1.5]

Restore the data from your backup onto the replacement disk for
the logical volumes identified in step 1.4. For raw volumes, you can
simply restore the full raw volume using the utility that was used to
create your backup. For file systems, you will need to recreate the
file systems first. For our example:

For HFS:

# newfs -F hfs /dev/vg00/rlvol5
# newfs -F hfs /dev/vg00/rlvol6

For JFS:

# newfs -F vxfs /dev/vg00/rlvol5
# newfs -F vxfs /dev/vg00/rlvol6

Note that we use the raw logical volume device file for the newfs
command. For file systems that had non-default configurations, please
consult the man page of newfs for the correct options.

After a file system has been created on the logical volume mount the
file system under the mount point that it previously occupied. Take
whatever
steps are necessary to prevent your applications or users from accessing the
filesystem until the data has been recovered. Now that the filesystem has
been
created simply restore the data for that file system from backups.

Note: You will need to have recorded how your file systems were
originally created in order to perform this step. The only
critical feature of this step is that the file system be at
least as large as before the disk failure. You can change
other file system parameters, such as those used to tune the
file system's performance.

For the file system case, there is no need to worry about data on the
disk (c0t5d0) that was newer then the data on the tape. The newfs
wiped out all data on the lvol5. For the raw volume access, you may
have to specify your restore utilities over-write option to guarantee
bringing the volume back to a known state.

KCS_1
Respected Contributor

Re: Hard disk failure

Hi,again

>>This disk is in a external storage and is hot swappable. The disk is not mirrored and the VG has three disks in it.<<

yes, of course you don't need to reboot your system If failed disk is hotswapable.

I will sumarize the procesure toughly.

1. put out and into new disk on system.
2. pvcreate
3. vgrestorecfg
4. staled lvm restore

I think that your idea is good!
Look at more specific information in the file.


Easy going at all.
Eugeny Brychkov
Honored Contributor

Re: Hard disk failure

First action I would do is to identify disk and reseat it - pull it out, wait for 20 seconds and put it back securely. Then in 30 seconds do an 'ioscan -fn' to see if disk recovered. If not - replace disk, if it will recover - backup ASAP and then investigate what has happened
Eugeny
Vikas_2
Advisor

Re: Hard disk failure

Thanks everyone....I'll wait for the disk tobe replaced and then proceed...

My complete system backup is with fbackup. after the disk replacement, I'll have to recover only one file system. do I need to take any special care for this restore.?

-Vikas
Robert-Jan Goossens
Honored Contributor

Re: Hard disk failure

Hi Vikas,

No nothing special, just chase you users out of the system during the restore and keep your application offline.

Good luck.

Robert-Jan