MSA Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

MSA2052 not recognizing disk groups after power cycle

 
m8x
Occasional Contributor

MSA2052 not recognizing disk groups after power cycle

Hello,

 

I have a MSA2052 with 2 additional enclousures = 3

It is configured with several disk groups (RAID10) splitted over all 3 enclousures.

When I power off enclousure 2 to simulate a failure, all disk groups become degraded but everything keeps working as expected.

When I now power up enclousure 2 again, the head unit discovers it and states that all disks in this enclousure 2 were previously member of a disk group. See screenshot here: https://imgur.com/a/Ce9o3KV

I was assuming that the MSA detects that these disks belong to the existing disk group and becomes healthy again. But nothing happens ...

Is this assumption wrong? How could I bring back the disk groups to a healthy state?

 

Thanks

5 REPLIES 5
Shawn_K
HPE Pro

Re: MSA2052 not recognizing disk groups after power cycle

Hello,

You have disk groups that are striped across enclosures. When you power down the enclosure you remove drives from the array. This simulates drive(s) removal.

The array controller stamps each drive with metadata that identifies the drive as belonging to a disk-group, vdisk, pool, etc and adds a time stamp. That time stamp is updated frequently. When you remove an enclosure the remaining drives in a disk-group/vdisk are updated but the removed drives do not get the time stamp update. So when you re-add the enclosure the drives in the enclosure have a time stamp that does not match the existing drives in the disk-group/vdisk.  By design the array firmware will mark those drives as belonging to a disk-group/vdisk but in a leftover state if the time stamps are not coherant enough to automatically add back into the original disk-group/vdisk. At this point it is up to the user to take corrective steps.

You can use the Trust command to accept disks back into a disk-group/vdisk. Linear vdisks behave differently than virtual. You should use great caution when using Trust as this can lead to data integrity issues. Please read the MSA Troubleshooting Guide for further instructions and cautions when using Trust: https://support.hpe.com/hpsc/doc/public/display?docId=c05177410

Cheers,
Shawn

I work for Hewlett Packard Enterprise. The comments in this post are my own and do not represent an official reply from HPE. No warranty or guarantees of any kind are expressed in my reply.

Accept or Kudo

m8x
Occasional Contributor

Re: MSA2052 not recognizing disk groups after power cycle

Thanks Shawn for the explanation.

Since all disks are in a Raid 10 and I "lost" 1/3 of the HDDs, I was assuming that this can be easily rebuilt/resynced..?

I could think of clearing the metadata on all the "lost" disks so they become available again and can be set to global spare so the disk groups can be rebuilt from scratch.

But I know that then the rebuild will mix up my pre-defined disk order, means that it will not keep the desired order as I did it on creation, resulting in mixing the disks for Controller A and B. More an optical/clear view thing than an performance issue, I know.

This is how it was setup: https://imgur.com/a/cwRxvhs

first 5 disks belong to Controller A, disk group 2

second 5 disks belong to Controller B disk group 2

last 6 disks belong to Controller A, disk group 3

 

This is how it will look like when I clear all metadata and the system has done a auto-rebuild: https://imgur.com/a/cvkowQU

This also happens to the two small SSD groups which I didnt colour in the pictures....

How can I get it in the original state so it looks like 3 blocks?

 

Thanks

Luca
Frequent Advisor

Re: MSA2052 not recognizing disk groups after power cycle

Hi Shawn,

   In the older models (P2000G3 for ex or even MSA2312) there was the "set spares" command where you could specify manually what disk to assign as spare to what vdisk (set spares disks x.x vdisk <volname>). This way you could manually assure that the disk in the specified slot would be RCONed relative to it's physical neighbor in the same vdisk pool. However with the 2050 this command has been deprecated and replaced by "add spares" which lets you add the disk to a global set of spares so you cannot specify a specific vdisk.

The following to be taken with a grain of salt as we have not tested this on the 2050 - I think this was done on one of the first models (P2000 G1 or MSA) though I don't remember as it was a long time ago. In any case here's what we did to realign the physical disks: this unit had 2 disks out of sequence (they failed practically one after the other) as when they were reinsterted their relevant vdisks were swapped during auto spare and vdisk reassignment, we decided to gracefully "shutdown both" SCs and MCs and power off as this could be done in a maintenance window. Then we swapped the disks and powered back on. I recall there was a warning in the log that the disks were out of position but apart from that everything was back online (each disk has metadata that identifies it's position and vdisk ownership which is also written in both of the controller's the Flash memory, similar to how many controllers work such as the same Smart Arrays and older hp NetRAID - this avoids disks being destroyed if moved in the wrong slots in which they were originally positioned and is needed if you move them to different controllers etc which I often did in the past). Obviously this move works only when all is shutdown correctly (no "dirty cache" LED on). NEVER move disks in this state or in an unknown state, always proceed with a graceful "shutdown both" and power off once all disk activity has ceased before touching anything.

    As I said, take this with a grain of salt as it is untested on the 2052 so I have no idea if the behavior has changed since then. It would be adviseable to try it with a test vdisk in a non critical environment without any non backed up data on.

Shawn_K
HPE Pro

Re: MSA2052 not recognizing disk groups after power cycle

Hello,

The older MSA arrays (MSA2000, MSA2300, MSA P2000) were able to assign a dedicated spare to a linear vdisk. There was also the ability to assign a global spare. Using the global spare option, an available spare will go to the first vdisk that needs a drive in order to reconstruct. There are other factors involved, such as size and the need of the vdisk or disk-group. For example, if you have a RAID5 disk-group missing one drive and a RAID6 disk-group missing a drive and one global spare assigned, then the RAID5 disk-group will get the global spare as the need to rebuild is more critical to the RAID5 disk-group than the RAID6 which can withstand one more drive loss before becoming critical.

With the MSA 2050, there is only the ability to assign global spares. For this case there is the additional complication that the order of drive failure and the criticality of the RAID set will play a factor in assigning the drives back to the disk-groups that need to reconstruct.

For this case, my suggestion is to assign all the available drives back as global spares and let the disk-groups reconstruct. Data protection and redundancy of the disk-groups should be the first priority. Once the disk-groups have rebuild completely, I would recommend a backup of the data. After the backup has been completed, you can shut down the array and move the drives to the correct slots so you have the drives in the locations you desire. Just make sure you have shutdown the array properly, and allowed all the disk enclosures to spin down the drives before moving them. Upon power up, a rescan will occur, and then you will see the disk-groups in the enclosures as you wish.

Cheers,
Shawn

I work for Hewlett Packard Enterprise. The comments in this post are my own and do not represent an official reply from HPE. No warranty or guarantees of any kind are expressed in my reply.

Accept or Kudo

Luca
Frequent Advisor

Re: MSA2052 not recognizing disk groups after power cycle

Thanks for confirming this procedure can still be done on the newer 2050 family as well. It's a pity the set spare with vdisk option command has been retired, it was useful to prevent this hassle.