HPE EVA Storage

MSA20 Logical drives waiting for rebuild for few days.

 
Sgul
Frequent Advisor

MSA20 Logical drives waiting for rebuild for few days.

Hi,
Last week two drives in our MSA20 enclosure went amber. After re-seating them, they were ok and ACU reported that its rebuilding the logical drives.

But after 4 days now I can still see the logical drives "awaiting rebuild" with ACU reporting a message #771:
"The current array controller had a valid data stored in its battery backed write cache the last time it was reset or was powered up. This indicates that the system may not have been shutdown gracefully. The array controller has automatically written, or has attempted to write, this data to the drives. this message will continue to be displayed until the next reset or power-cycle of the array controller"

What is this message trying to say ?
Does the MSA20 enclosure need a reset ?
Will the logical drives will not be re-built until I do this reset ?
Or is it refering to reseting the MSA1500cs controller?

Appreciate any assistance with this message.
thanks
7 REPLIES 7
Sgul
Frequent Advisor

Re: MSA20 Logical drives waiting for rebuild for few days.

I've attached Array diagnostic report.
Logical Drives 4 and 5 are waiting for rebuild.
Sgul
Frequent Advisor

Re: MSA20 Logical drives waiting for rebuild for few days.

What is confusing is that non of the drives in the MSA20 are showing amber lights. So visually I can't see any drives as being faulty.

But the rebuild is obviously being held up due to some drive error.

How would I go about finding out which drive is the culprit ?

PVD
Advisor

Re: MSA20 Logical drives waiting for rebuild for few days.

Hi

Rebuild failed due to read errors on Physical Drive 3:5 and 3:10 in the MSA20.

Taking a good backup of the data would be the first thing to do, as you may end up recreating the logical volumes due to multiple faults in the RAID
Ref : http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&prodSeriesId=377751&prodTypeId=12169&objectID=c01452806

Once you have a valid backup, perform a graceful shutdown of the MSA storage, reseat the suspect drives and bootup the MSA.
Shutdown in the order: Servers -> MSA1500 controller shelf - > disk enclosures.
Powerup shld be in the reverse order. Refer the following advisory
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&prodSeriesId=415598&prodTypeId=12169&prodSeriesId=415598&objectID=c01204574

If rebuild still stuck/fails, the logical volume will need to be recreated and data restored from tape.
Sgul
Frequent Advisor

Re: MSA20 Logical drives waiting for rebuild for few days.

thanks for the response.

The raid level on these volumes is ADG6. And according to the customer advisory document it takes trip-fault conditions to halt a logical drive rebuild.
that's probably what's happened to our MSA20.

I'll schedule a downtime for the SAN.
Sgul
Frequent Advisor

Re: MSA20 Logical drives waiting for rebuild for few days.

Hi PVD,
I'm not sure what will happen to the logical drives that are 'waiting for rebuild" if I gracefully restart the SAN ?
Will the logical drive disappear or will they start rebuilding ?

PVD
Advisor

Re: MSA20 Logical drives waiting for rebuild for few days.

Sorry for the delayed response. It should start the rebuild after powerup.
Sgul
Frequent Advisor

Re: MSA20 Logical drives waiting for rebuild for few days.

thanks PVD,

After I restarted the MSA1500 and MSA enclosure the two ofending volumes were still reporting 'awaiting rebuild'.
After leaving them for a while I decided to re-seat the suspect disk. I knew this might totally stuff the volumes that were awaiting rebuild but I had managed to get backups earlier.
After pulling out the disk, ACU reported failure of the two volumes that were awaiting rebuild saying 'all data has been lost' etc.
I re-seated the disk and then via ACU re-initiated the two volumes which were coming up as new volumes now.
Then I left system alone for a few hours while it did what it had to do to get back online.
After few hours the ACU reported all good volumes.
Then I started one server at a time to see what they saw on the volumes being presented by the MSA controller.
To my relief and surprise. All data was still intact. The Veritas Storage Foundation manager reported that I should run chkdsk on the volume that was just initialised. I did and chkdsk found some errors and fixed them.
Following that I started turning on services and all seems to be good now.