Disk Arrays
cancel
Showing results for 
Search instead for 
Did you mean: 

MSA100 failed disk replacement

Allen Jasewicz_2
Occasional Advisor

MSA100 failed disk replacement

There is no spare drive in the array and it is in a raid 5 configuration. We pulled the failed drive, inserted the new drive and the light activity light lit then when off. I can use the locate command to flash the led on the drive however the array does not recognize the drive. It is interesting that Disk104, the drive that failed, shows none and DISK000 shows as failed, I do not have a box 0. It looks as if the replacement drive was assigned to a phantom drive DISK000. Is this similar to the Auto Raid device where I need to re-initialize the disk before the unit will see it?
CLI> show unit 0

Unit 0:
In PDLA mode, Unit 0 is Lun 1; In VSA mode, Unit 0 is Lun 0.
Unit Identifier :
Device Identifier : 600805F3-000D71E0-AE9D8B39-41990001
Cache Status : Enabled
Max Boot Partition: Disabled
Volume Status : VOLUME USING REGENERATE
Parity Init Status: Complete
8 Data Disk(s) used by lun 0:
Disk101: Box 1, Bay 01, (SCSI bus 0, SCSI id 0)
Disk102: Box 1, Bay 02, (SCSI bus 0, SCSI id 1)
Disk103: Box 1, Bay 03, (SCSI bus 0, SCSI id 2)
Disk000: Box 0, Bay 00, (SCSI bus 0, SCSI id 7) DRIVE FAILED!
Disk105: Box 1, Bay 05, (SCSI bus 0, SCSI id 4)
Disk106: Box 1, Bay 06, (SCSI bus 0, SCSI id 5)
Disk107: Box 1, Bay 07, (SCSI bus 0, SCSI id 8)
Disk108: Box 1, Bay 08, (SCSI bus 1, SCSI id 0)
Spare Disk(s) used by lun 0:
No spare drive is designated.
Logical Volume Raid Level: DISTRIBUTED PARITY FAULT TOLERANCE (Raid 5)
stripe_size=16kB
Logical Volume Capacity : 980,097MB

CLI> show disks
box,bay bus,ID Size Speed Units
Disk101 1,01 0,00 146.8 GB 160 MB/s 0
Disk102 1,02 0,01 146.8 GB 160 MB/s 0
Disk103 1,03 0,02 146.8 GB 160 MB/s 0
Disk104 1,04 0,03 146.8 GB 160 MB/s none
9 REPLIES
gregersenj
Honored Contributor

Re: MSA100 failed disk replacement

I have never seen that on a MSA1000.

Normally. All you need to do, when replacing a disk:
Remove the failed disk.
Insert the new disk.
And it will rebuild automatically.

But Ghost ID's has been seen in the past years.
There's been some post's here on Smart Array 5xxx and 6xxx on rebuilding.
This has been due to the failed disk, has had a non exsistent ID. Then when replaced the Controller was waiting for a new disk on that non exsistent ID.
But never seen it on a MSA1000.

Your case is different.
It appaer the replacement disk got a ghost ID.

Try the following steps.
Reseat disk - Pull it, wait 30 secs. reinsert.

If that doesn't help.
Remove disk.
upgrade FW
re-insert disk.

If that doesn't help.
Try with another disk.

If that doesn't help.
Open a case at HP.

BR
/jag

Allen Jasewicz_2
Occasional Advisor

Re: MSA100 failed disk replacement

We pulled the disk and left it out for about an hour, no change. It's a remote site and I lost my set of hands for that long. We will attempt the firmware upgrade.

CLI> show version
Firmware version: 4.48 build 342
Hardware Revision: 7 [AutoRev: 0x010000]
Internal EMU Rev: 1.86 (ZF0JKF3129)
External Box EMU 2 Rev: CP20 (8A08DTN1J031)


Any suggested reading links to increase my comfort level with upgrading the firmware?
gregersenj
Honored Contributor

Re: MSA100 failed disk replacement

Link to release notes:

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=dk&prodTypeId=12169&prodSeriesId=377751&swItem=MTX-a8be59eac6d84c1eb3cfcef835&prodNameId=377753&swEnvOID=1005&swLang=8&taskId=135&mode=4&idx=0

If possible.
You could put a drive in another bay, and set that up as spare drive.
So your RAID 5 will become safe, using the spare.

Taht will also clearfy if it is the new drive, that has got a problem.

BR
/jag

Allen Jasewicz_2
Occasional Advisor

Re: MSA100 failed disk replacement

I talked to HP support yesterday afternoon. I was surprised the support person was allowed to give me assistance since I am between contracts. What I was told was I had to get to FW 5.3 to be current. To get to 5.3 I have to upgrade first to 5.2. Should I be going to 7.2? If so what interim upgrade is necessary from my current 4.48? I believe we are in an active/passive mode of operation does that make a difference? If so I will have to verify. I will attach the tech_support output.
gregersenj
Honored Contributor

Re: MSA100 failed disk replacement

I would do as suggested by HP Support.

BR
/jag
Allen Jasewicz_2
Occasional Advisor

Re: MSA100 failed disk replacement

We have upgraded the firmware in the array. It is not at "5.30b1200 (ZF0JKF3129)". Here is the output of "show unit 0" :
Unit 0:
In PDLA mode, Unit 0 is Lun 1; In VSA mode, Unit 0 is Lun 0.
Unit Identifier :
Device Identifier : 600805F3-000D71E0-AE9D8B39-41990001
Preferred Path : Controller 1 (this controller)
Cache Status : Enabled
Cache Status : Enabled
Volume Status : VOLUME USING REGENERATE
Parity Init Status: Complete
8 Data Disk(s) used by lun 0:
Disk101: Box 1, Bay 01, (B:T:L 0:00:00)
Disk102: Box 1, Bay 02, (B:T:L 0:01:00)
Disk103: Box 1, Bay 03, (B:T:L 0:02:00)
Disk107: Box 1, Bay 07, (B:T:L 0:03:07) DRIVE FAILED! (0x0C)
Disk105: Box 1, Bay 05, (B:T:L 0:04:00)
Disk106: Box 1, Bay 06, (B:T:L 0:05:00)
Disk107: Box 1, Bay 07, (B:T:L 0:08:00)
Disk108: Box 1, Bay 08, (B:T:L 1:00:00)
Spare Disk(s) used by lun 0:
No spare drive is designated.
Logical Volume Raid Level: DISTRIBUTED PARITY FAULT TOLERANCE (RAID 5)
stripe_size=16kB
Logical Volume Capacity : 980,097MB

The replaced disk should be Disk104. If you look there are two disks labeled "Disk107". How do I get Disk107 at (B:T:L 0:03:07) named Disk104?

As you can see from show version -all that th unit does recognize the drive. When the drive was pulled BOX 1 BAY 4 went away, when the drive was inserted BOX 1 BAY 4 came back and the Serial No. do match up.
CLI-1> show version -all
MSA1000 Firmware Revision: 5.30b1200 (ZF0JKF3129)
Build Time: 2008-06-04 08:23:40
MSA1000 Hardware Revision: 7 [AutoRev: 0x010000]
Fibre Module AutoRev: 0x000000
Box 1a PROLIANT 4L7I DB Rev: 1.86 (ZF0JKF3129)
Box 1b PROLIANT 4L7I DT Rev: 1.86 (ZF0JKF3129)
Box 2 PROLIANT 4LEE Rev: CP20 (8A08DTN1J031)

Disk Drives:
B T L BOX BAY GB Model Rev. Serial No.
0 0 0 1 1 146.8 COMPAQ BD14686225 HPB6 UP01P4606AU00423
0 1 0 1 2 146.8 COMPAQ BD14686225 HPB6 UP01P4606P4J0427
0 2 0 1 3 146.8 COMPAQ BD14686225 HPB6 UP01P4606BUR0424
0 3 0 1 4 146.8 COMPAQ BD146863B3 HPB6 B8NG5FBM
0 4 0 1 5 146.8 COMPAQ BD1468856B HPB2 AEA1P5901HGS0539
0 5 0 1 6 146.8 COMPAQ BD146863B3 HPB5 B8P74SSM
0 8 0 1 7 146.8 COMPAQ BD14686225 HPB6 UP01P4606BMJ0424
1 0 0 1 8 146.8 COMPAQ BD14686225 HPB6 UP01P450607D0419
1 1 0 1 9 146.8 COMPAQ BF1468A4CC HPB5 3KN2FJBG00009711MGHT
1 2 0 1 10 146.8 COMPAQ BF1468A4CC HPB5 3KN2FJEL00009710EY8K
1 3 0 1 11 146.8 COMPAQ BF1468A4CC HPB5 3KN2FGWK000097112KEB
1 4 0 1 12 146.8 COMPAQ BF1468A4CC HPB5 3KN2FJ6C00009711MGLF
1 5 0 1 13 146.8 COMPAQ BF1468A4CC HPB5 3KN2FJEE00009711N60Y
1 8 0 1 14 146.8 COMPAQ BF1468A4CC HPB5 3KN2FGQG00009710X5AG
2 0 0 2 1 300.0 COMPAQ BD30089BBA HPB1 DA01P710A5JA0703
2 1 0 2 2 300.0 COMPAQ BD30089BBA HPB1 DA01P710A26K0703
2 2 0 2 3 300.0 COMPAQ BD30089BBA HPB1 DA01P710A47C0703
2 3 0 2 4 300.0 COMPAQ BD30089BBA HPB1 DA01P720BB8D0709
2 4 0 2 5 72.8 COMPAQ BD072863B2 HPB3 B4AN08YM
2 5 0 2 6 72.8 COMPAQ BD07285A25 HPB3 3HZ11K640000733877UC
2 8 0 2 7 72.8 COMPAQ BD072863B2 HPB3 B49K1P0M
2 9 0 2 8 72.8 COMPAQ BD072863B2 HPB3 B49P08ZM
2 10 0 2 9 72.8 COMPAQ BD07285A25 HPB3 3HZ0YH7700007337WM21
2 11 0 2 10 72.8 COMPAQ BD072863B2 HPB3 B49B01TM
2 12 0 2 11 72.8 COMPAQ BD072863B2 HPB3 B4A42FSM
2 13 0 2 12 72.8 COMPAQ BD07285A25 HPB3 3HZ11KVD000073386FVW
2 14 0 2 13 72.8 COMPAQ BF07289BC4 HPB1 DNA1P7C0E9EJ0752
2 15 0 2 14 72.8 COMPAQ BD07289BB8 HPB1 DEL1P78013B80733

Any ideas on how to make this array whole?
gregersenj
Honored Contributor

Re: MSA100 failed disk replacement

Ohh.

Yes it seem to be one of these ghost ID issues.

Only way I know is the hard way :/

Backup all data
Delete the RAID
Fw upgrade to latest
recreate
Restore.

Normally. When you got a failed or degraded disk. Simply pull it out and insert replacement disk.
The smart array will rebuild automatically.

BR
/jag
KAKIM
Occasional Visitor

MSA100 failed disk replacement

Have the following issue with our MSA 1000, any assistance, as firmwar is upto date, but the rebuild did not bring teh Volume back online.

 

Unit 0:
In PDLA mode, Unit 0 is Lun 1; In VSA mode, Unit 0 is Lun 0.
Unit Identifier   :
Device Identifier : 600805F3-000B4190-A8FFDEAE-530B0024
Preferred Path    : Controller 1 (this controller)
Cache Status      : Enabled
Max Boot Partition: Enabled
Volume Status     : VOLUME FAILED
Parity Init Status: Complete
10 Data Disk(s) used by lun 0:
   Disk101: Box 1, Bay 01, (B:T:L 0:00:00)
   Disk102: Box 1, Bay 02, (B:T:L 0:01:00)
   Disk103: Box 1, Bay 03, (B:T:L 0:02:00)
   Disk104: Box 1, Bay 04, (B:T:L 0:03:00)
   Disk105: Box 1, Bay 05, (B:T:L 0:04:00)
   Disk106: Box 1, Bay 06, (B:T:L 0:05:00)   REPLACEMENT
   Disk113: Box 1, Bay 13, (B:T:L 1:05:00)   REPLACEMENT
   Disk108: Box 1, Bay 08, (B:T:L 1:00:00)
   Disk109: Box 1, Bay 09, (B:T:L 1:01:00)
   Disk000: Box 0, Bay 00, (B:T:L 0:255:00)   DRIVE FAILED! (0x0C)
Spare Disk(s) used by lun 0:
   No spare drive is designated.
Logical Volume Raid Level: DISTRIBUTED PARITY FAULT TOLERANCE (RAID 5)
                           stripe_size=16kB
Logical Volume Capacity : 250,513MB

Torsten.
Acclaimed Contributor

Re: MSA1000 failed disk replacement

Why not open your own thread?

However, you replaced 2 disks within a RAID5 and have another failed disk, right? The volume is already "failed", I think you need to use your backup, sorry.


Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!