MSA Storage

MSA P2000 vdisk crash after replacing failed drive

Ben Dalton
Occasional Advisor

MSA P2000 vdisk crash after replacing failed drive

Two days ago I began receiving emails from my P2000 that the drive in slot 12 was experiencing a unrecoverable read error (event 58).  This continued for a few minutes until the drive failed completely and the vdisk went into a critical state.

The next morning, I replaced the drive with a new one and configured it as a hot spare.  The vdisk began reconstruction, but after about 10 minutes, I received a vdisk quarantined message followed by a reconstruct failed message.

Following that the LED on the replacement drive in slot 12 stayed green, but the drive in slot 10 went amber and was labelled "leftovr" in the management utility. The vdisk was again in a critical state and was inaccessible, crashing several of my VMs.

I shut down all of my VMs and then the SAN, removed the new drive from slot 12 and restarted everything.  When everything was back up, the vdisk was again accessible though in a critical state being one drive short.

I repeated the drive replacement this morning. Cleared the metadata from the drive and added it as a hot spare. Again, in about 10 minute of reconstruction, the vdisk failed and the drive in slot 10 was marked "leftovr".  I removed the drive from slot 12 and the vdisk recovered in about 30 seconds.

Can anyone tell me what's going on?

Occasional Visitor

Re: MSA P2000 vdisk crash after replacing failed drive

Hello Ben Dalton

I had problems similar to yours. As the EVENT CODE: 58 errors occurred in different HD's and at different times, I opened the incident on HP before making the HD switch. In the end, I thought it would be better to empty the whole P2000 and then go to firmware update of all HD's and after all the controllers. I did'nt update since 2010. After all this we detected that only one disk was in trouble, the rest were probably bugs.

If your firmwares are not updated, I think that update can help you. But in my case, I migrated all data before.

Re: MSA P2000 vdisk crash after replacing failed drive

As per the issue description what I understand you initially faced drive failure issue with Slot 12 and you have replaced with new drive followed by configuring spare drive. When reconstruction was going on that time drive at slot 10 became amber and vdisk reconstruction failed. This is normal.

Let's say for example, you have vdisk created with 5 drives in RAID5..........out of these 5 drives I am assuming slot 12 and slot 10 was there as well. 1st failed slot 12 drive and after replacement reconstruction started. Before this reconstruction gets completed we faced slot 10 got failed or went to leftover state. At this situation we are missing two drives from RAID5 that's why vdisk went to QTOF state. RAID5 can't survive more than one drive fail.

So always monitor your system and replace drives proactively if you notice hardware error. Never wait for drive to fail otherwise it will be too late to recover data back.

I work for HPE
Accept or Kudo