HPE EVA Storage
1848842 Members
7960 Online
104038 Solutions
New Discussion

HSG80 - Failed Disk Replacement

 
SAKET_5
Honored Contributor

HSG80 - Failed Disk Replacement

Hi All,

What is the correct procedure to determine which storageset/unit originally had a disk failure? I will explain this in the following scenario:
M_RAID_1 contains three disks DISK11100, DISK21100 & DISK31100 (all equal sizes)
SPARESET of the storage array contains DISK22000, DISK32000 & DISK44000
DISK21100 in M_RAID_1 fails and the DISK32000 happens to come out of the spareset to replace the failed disk from M_RAID_1.
DISK21100 has moved to FAILEDSET and SPARESET has reduced to contain only two disk members DISK22000 & DISK44000.

When you get a new replacement disk, what would be the correct procedure to identify that it was in DISK32000 that now needs to return to SPARESET and the new disk most likely to be DISK21100 needs to return back to M_RAID_1. You could simply add the replacement disks to the SPARESET but this might leave the RAIDSET in question in a non-optimal configuration as in the current example - two disks from the same channel, etc.

Does FMU or any of HSG Element Manager Logs contain information on Disk Failures as well as information such as "DISK51200 from S_RAID_2 failed", i.e. their storageset/unit information? In your response, could you please include CLI equilvalent command to determine the logs - I have tried FMU without any luck.

Yes, I know that you could set up soft procedures (prone to human errors that always set in your raidset such that each member is from a different channel - this is what we do but my point is it would be better if we could extract the current layout/setup information from the controllers rather than relying on our assumptions!)

Cheers,
Saket.
3 REPLIES 3
Uwe Zessin
Honored Contributor

Re: HSG80 - Failed Disk Replacement

Last time I checked, only controller failures were in the HSG log, but not 'simple' disk drive errors.

I think that it is an absolute requirement to have a documentation about the desired configuration - else, a few disk failures later and you can end up with a storageset with two members on the same controller port. That's one of the reasons I am not a fan of the AUTOSPARE feature. The controllers do not have any memory about the storage system's history. Once a disk has left a storageset, that information is gone!

Whether the HSG element manager can be of help, I am not sure - haven't fired it up for month, now. I do remember that it logs disk failures, though.

During initial setup, you get a warning if two or more members are configured on the same controller port (I bet you are aware of this), but I have never seen a feature that allows later checks.
.
SAKET_5
Honored Contributor

Re: HSG80 - Failed Disk Replacement

Thanks Uwe,

Yeh, thats what we currently do - humungous spreadsheets to document every thing but its all very manual!! I have considered writing a script that emunerates all the disk members of raidsets/mirrorsets and for each set performs a check to determine that the disk members come from different channels, etc. It then sends a regular email if it detects a non-optimal configuration...

However, at the back of my mind, I know that I would be migrating all our storage currently resident on HSGs to some sort of EVAs in near future - so find it hard to justify the ROI on spending any time on HSGs.

Thanks for your inputs:)

Regards,
Saket.

Uwe Zessin
Honored Contributor

Re: HSG80 - Failed Disk Replacement

Just an idea...
why not capture the output of "SHOW DISKS FULL" from time to time and run a difference between the most recent versions?
.