Operating System - HP-UX
1837035 Members
2744 Online
110111 Solutions
New Discussion

Re: Replacing a failing ext scsi drive

 
SOLVED
Go to solution
Stuart McKay
Advisor

Replacing a failing ext scsi drive

I have a failing external SCSI disk that I need to replace. My idea is to add a new disk and move the file system across, basically.

/dev/dsk/c0t8d0 -- failing disk
lvol8 -- /var
lvol9 -- /disk1
lvol10 -- /disk2
lvol11 -- /disk3 are all on this disk under vg00.

If I add the new disk as /dev/dsk/c0t9d0 and create vg01, can I use sam or command line arg's to move these across to the new disk??
18 REPLIES 18
Adisuria Wangsadinata_1
Honored Contributor

Re: Replacing a failing ext scsi drive

Hi Stuart,

First you need to make sure that the data on the disk still accessable. And do backup on the data while you still be able to do that.

For lvol9, lvol10, lvol11, you can add the new disk and create vg01. Add the disk using basic LVM command plus frecover command (see the man page of frecover) will help you on these tasks.

But for lvol8 which is /var, it's still better if you put this file system on vg00.

Check the document at this url, hope this will help you to replace the faulty disk on your system :

http://www.docs.hp.com/en/5991-1236/When_Good_Disks_Go_Bad.pdf

Hope this information can help you.

Cheers,
AW
now working, next not working ... that's unix
Rajeev  Shukla
Honored Contributor

Re: Replacing a failing ext scsi drive

The easiest way would be if you have mirroring (HPUX Mirror) software installed.
In which case you can add this new disk in the same VG and mirror all lvol's in the failing disk to the new disk and then remove the mirror copy from the failling disk.

Or do a pvmove and move each lvol's from the failing disk to the new replaced disk.

Stuart McKay
Advisor

Re: Replacing a failing ext scsi drive

The disk is still accessible and I am backing it up right now.

I'll have a look at the link you supplied and get back to you with any outcome.

Thanks.
Devender Khatana
Honored Contributor
Solution

Re: Replacing a failing ext scsi drive

Hi,

The solution for this will be very simple if your data is still accessible.

1.Complete taking backup of the failing disk.

2. Add another disk to vg00.
#pvcreate /dev/dsk/c0t9d0
#vgextend /dev/vg00

3. Move all the extents of failing disk to the new disk. The size of the new disk should be atleast equal or more than the size of the current disk.
#pvmove /dev/dsk/c0t8d0 /dev/dsk/c0t9d0

4. Verify that failing disk has no more PE allocated.
#pvdisplay -v /dev/dsk/c0t9d0

5.Reduce VG from the failing disk.
#vgreduce /dev/vg00 /dev/dsk/c0t8d0

Note- Step 3 & 4 can be automatically performed when you go for option five but it is safe to do it as mentioned.

HTH,
Devender
Impossible itself mentions "I m possible"
Stuart McKay
Advisor

Re: Replacing a failing ext scsi drive

OK, after performing my backup, adding the new disk, I get part of the way through pvmove and this happened.

SCSI LUN..........: 0x00

I/O Log Event Data:

Driver Status Code..................: 0x0000007C
Length of Logged Hardware Status....: 22 bytes.
Offset to Logged Manager Information: 24 bytes.
Length of Logged Manager Information: 34 bytes.

Hardware Status:

Raw H/W Status:
0x0000: 00 00 00 02 F0 00 03 00 83 E6 E2 0A 00 00 00 00
0x0010: 11 00 E4 80 00 8C

SCSI Status...: CHECK CONDITION (0x02)
Indicates that a contingent allegiance condition has occurred. Any
error, exception, or abnormal condition that causes sense data to be
set will produce the CHECK CONDITION status.

SCSI Sense Data:

Undecoded Sense Data:
0x0000: F0 00 03 00 83 E6 E2 0A 00 00 00 00 11 00 E4 80
0x0010: 00 8C

SCSI Sense Data Fields:
Error Code : 0x70
Segment Number : 0x00
Bit Fields:
Filemark : 0
End-of-Medium : 0
Incorrect Length Indicator : 0
Sense Key : 0x03
Information Field Valid : TRUE
Information Field : 0x0083E6E2
Additional Sense Length : 10
Command Specific : 0x00000000
Additional Sense Code : 0x11
Additional Sense Qualifier : 0x00
Field Replaceable Unit : 0xE4
Sense Key Specific Data Valid : TRUE
Sense Key Specific Data : 0x80 0x00 0x8C

Sense Key 0x03, MEDIUM ERROR, indicates that the command terminated
with a nonrecovered error condition that was probably caused by a
flaw in the medium or an error in the recorded data. This sense key
may also be returned if the device is unable to distinguish between a
flaw in the medium and a specific hardware failure (sense key 0x04).
For the RECOVERED ERROR, HARDWARE ERROR, or MEDIUM ERROR Sense Key,
the Sense Key Specific data indicates that 140 retries were
attempted.

The combination of Additional Sense Code and Sense Qualifier (0x1100)
indicates: Unrecovered read error.

SCSI Command Data Block:

Command Data Block Contents:
0x0000: 28 00 00 83 E6 00 00 02 00 00

Command Data Block Fields (10-byte fmt):
Command Operation Code...(0x28)..: READ
Logical Unit Number..............: 0
DPO Bit..........................: 0
FUA Bit..........................: 0
Relative Address Bit.............: 0
Logical Block Address............: 8644096 (0x0083E600)
Transfer Length..................: 512 (0x0200)

Manager-Specific Data Fields:
Request ID.............: 0x00011BCC
Data Residue...........: 0x00023C00
CDB status.............: 0x00000002
Sense Status...........: 0x00000000
Bus ID.................: 0x00
Target ID..............: 0x08
LUN ID.................: 0x00
Sense Data Length......: 0x12
Q Tag..................: 0x74
Retry Count............: 5


>---------- End Event Monitoring Service Event Notification ----------<
Devender Khatana
Honored Contributor

Re: Replacing a failing ext scsi drive

Hi Stuart,

If your backup completed successfully then you are lucky. The reason for this alert is your failed disk & the reason why it did not appeared in backup is that backup reads only areas where data is written. Whereas in pvmove all the PE's are read and copied regardless of the contents.

For other file systems you can do with restoration but atleast for /var it need to be copied for keeping the system bootable.
How many disks are there in vg00?

Can you post the output of "strings /etc/lvmtab"?

HTH,
Devender
Impossible itself mentions "I m possible"
Stuart McKay
Advisor

Re: Replacing a failing ext scsi drive

Backup completed.

# more /etc/lvmtab
^CM-h^A/dev/vg00
x^EM-MP?M-^^M-xM-R^D/dev/dsk/c0t6d0
/dev/dsk/c0t5d0
/dev/dsk/c0t8d0
/dev/dsk/c0t9d0
Devender Khatana
Honored Contributor

Re: Replacing a failing ext scsi drive

Hi Stuart,

Can you also post the full output of vgdisplay -v /dev/vg00?

Do you have mirror-Ux installed? Find it by
#swlist -l product |grep -i mirror

Also try the above posted option of pvmove command. And when it completes again post full output of "vgdisplay -v /dev/vg00"

HTH,
Devender
Impossible itself mentions "I m possible"
Stuart McKay
Advisor

Re: Replacing a failing ext scsi drive

Hi Devender

# vgdisplay -v /dev/vg00
--- Volume groups ---
VG Name /dev/vg00
VG Write Access read/write
VG Status available
Max LV 255
Cur LV 11
Open LV 11
Max PV 16
Cur PV 4
Act PV 4
Max PE per PV 2500
VGDA 8
PE Size (Mbytes) 4
Total PE 9007
Alloc PE 6387
Free PE 2620
Total PVG 1
Total Spare PVs 0
Total Spare PVs in use 0

--- Logical volumes ---
LV Name /dev/vg00/lvol1
LV Status available/syncd
LV Size (Mbytes) 112
Current LE 28
Allocated PE 28
Used PV 1

LV Name /dev/vg00/lvol2
LV Status available/syncd
LV Size (Mbytes) 2048
Current LE 512
Allocated PE 512
Used PV 1

LV Name /dev/vg00/lvol3
LV Status available/syncd
LV Size (Mbytes) 140
Current LE 35
Allocated PE 35
Used PV 1

LV Name /dev/vg00/lvol4
LV Status available/syncd
LV Size (Mbytes) 500
Current LE 125
Allocated PE 125
Used PV 1

LV Name /dev/vg00/lvol5
LV Status available/syncd
LV Size (Mbytes) 20
Current LE 5
Allocated PE 5
Used PV 1

LV Name /dev/vg00/lvol6
LV Status available/syncd
LV Size (Mbytes) 1100
Current LE 275
Allocated PE 275
Used PV 1

LV Name /dev/vg00/lvol7
LV Status available/syncd
LV Size (Mbytes) 1624
Current LE 406
Allocated PE 406
Used PV 1

LV Name /dev/vg00/lvol8
LV Status available/syncd
LV Size (Mbytes) 1500
Current LE 375
Allocated PE 375
Used PV 2

LV Name /dev/vg00/lvol9
LV Status available/stale
LV Size (Mbytes) 12000
Current LE 3000
Allocated PE 3001
Used PV 4

LV Name /dev/vg00/lvol10
LV Status available/syncd
LV Size (Mbytes) 6000
Current LE 1500
Allocated PE 1500
Used PV 2

LV Name /dev/vg00/lvol11
LV Status available/syncd
LV Size (Mbytes) 500
Current LE 125
Allocated PE 125
Used PV 1


--- Physical volumes ---
PV Name /dev/dsk/c0t6d0
PV Status available
Total PE 2168
Free PE 0
Autoswitch On

PV Name /dev/dsk/c0t5d0
PV Status available
Total PE 2169
Free PE 0
Autoswitch On

PV Name /dev/dsk/c0t8d0
PV Status available
Total PE 2170
Free PE 876
Autoswitch On

PV Name /dev/dsk/c0t9d0
PV Status available
Total PE 2500
Free PE 1744
Autoswitch On


--- Physical volume groups ---
PVG Name vg01
PV Name /dev/dsk/c0t9d0

This last one, vg01 was added by me by mistake.

# swlist -l product |grep -i mirror
#

Thanks
Stuart
Devender Khatana
Honored Contributor

Re: Replacing a failing ext scsi drive

Sturat,

756 PE's have allready moved the new disk before the error. Can you post the output of "pvdisplay -v /dev/dsk/c0t9d0" & "pvdisplay -v /dev/dsk/c0t8d0" to see which PE's are moved.
Allthough some part of lvol8 ( /var)) has definately moved. Let us see how much is still left.

HTH,
Devender
Impossible itself mentions "I m possible"
Stuart McKay
Advisor

Re: Replacing a failing ext scsi drive

Devender, please see attached.
Stuart McKay
Advisor

Re: Replacing a failing ext scsi drive

Devender, here is the other one.

Stuart.
Devender Khatana
Honored Contributor

Re: Replacing a failing ext scsi drive

Congrats Stuart,

Atleast the critical part of /var has fully moved from the faulty disk. Allthough still spans across two disks but the other one is one out of /dev/dsk/c0t6d0 or c0t5d0. The problem is there in some PE's allocated to lvol9 (/disk1). Can you test the backup of /disk1 and then remove lvol9 (/disk1) and can then retry pvmove & vgreduce command. I think it should go fine without problems and you should be required to restore only /disk1. But keep all backups intact. Also attach the output of bdf to see how much data /disk1 contains.

Allthough you do not have a healthy configuration as you are doing everything in one root vg. Do you have ignite Ux installed ? Is yes can you also take an ignite backup ?

HTH,
Devender

Impossible itself mentions "I m possible"
Devender Khatana
Honored Contributor

Re: Replacing a failing ext scsi drive

Another option is to make a new file system of same size as disk1 then mount is on some other path like /disk11 and copy data using cpio.

#pwd
/disk1
#find ./ -name "*" -print|cpio -pdmv /disk11

( It will copy only data and could complete successfully as your backup did.

HTH,
Devender
Impossible itself mentions "I m possible"
Stuart McKay
Advisor

Re: Replacing a failing ext scsi drive

Thanks ever so much for your time today Devender. I will look at both of these options now and see which one is best for me.

I have to leave in 1hr and will return in the morning, I will update you with what happens.

Much appreciated.

Stuart.
Devender Khatana
Honored Contributor

Re: Replacing a failing ext scsi drive

Stuart,

Thanks for the complements & you are always welcome.

Use second option first otherwise you can not use that.

HTH,
Devender
Impossible itself mentions "I m possible"
Stuart McKay
Advisor

Re: Replacing a failing ext scsi drive

Devender, all is well now in land of failing disks. The 1st option is the one I used as my backups worked fine.

The disk has been removed and stored away.

Once again thanks for your time and advise.

Stuart.
TwoProc
Honored Contributor

Re: Replacing a failing ext scsi drive

You can just put in another hard drive just like the one that's failing somewhere in the system and "dd" it.
dd if=/dev/rdsk/c0t8d0 of=/dev/rdsk/ bs=256K

If you're lucky b/c you read all of the bytes in the old drive, you can simply shutdown, put the new drive where the old drive was (and reset scsi id to what old drive was if necessary on that system - in most systems newer than the last 7 or 8 years it is not) and reboot.

This method is really great if you don't have Mirror UX and all of the bytes on the source drive can be read.

We are the people our parents warned us about --Jimmy Buffett