Disk Arrays
cancel
Showing results for 
Search instead for 
Did you mean: 

raidset failed state

sameerK
Frequent Advisor

raidset failed state

I have HP storage works with HSZ70 controllers

I had a raidset of 5 disks (9.1GB each) one disk failed The raidset had a policy of best_performance. There was one sparesset. Ideally the raidset should have taken this spareset automatically & reconstructed but it didn't.
show raidset command showed me
all 5 hdds - as no,information available. Raidset was in inoperative state and the failed disk showed as misconfigured.

I removed this hdd of spareset from spareset and physically slotted in place of the failed hdd.But the raid not reconstruct.

show raidset command showed me
all 5 hdds - as no,information available. Raidset was in inoperative state. But this time the failed disk did not show as misconfigured.

I installed SWCC for gui interface and with that I realized that the replaced HDD was 4.2 Gb in 9.1 case. I thought I will remove this disk & plugg the 9.1 GB hdd So I deleted this 4.3 GB disk. Then I slotted 9.1GB HDD in its place. Now I am seeing all the disks as good but the raidset is still failed..capacity is 0.00GB.
Have I lost the raid configuration of this raidset? Is there anyway I can reconstruct & rebuild this array with data intact?



2 REPLIES
Mark...
Honored Contributor

Re: raidset failed state

Hi sameek
The first question I have for you is have you put the original 9.1gb [failed] drive back in or did you put in a "new" drive to replace the failed one.

Also, before you removed / installed any drives did you quiese the relevant bus as this is what HP recommends when removing or installing disks. Just pulling them out or putting them in as you feel like it can cause problems if you are unlucky generally.

If you have put a new drive in the same location as the failed one then have you deleted the failed disk from the failedset?
To do this from the cli:
show failedset
look for any failed disks. if there are any then use:
"delete failedset disk10100" for example
Make sure that you always delete disks from the failedset and if you physically remove or move a disk from a HSz/j/d that you delete it from the controllers configuration.

OK, all devices deleted from the failed set & new disk installed. If the new disk is not detected by the controller then you can use "run config" to get the controller to rescan and add the disk into the controllers configuration for you. Make sure that it is picked up in the correct place. If not then manually delete and add in the old fashioned way with:
Add Disk DISK10000 1 0 0

Now put the disk in the spareset:
add spareset disk10000
and the disk should be put into your raidset automatically as long as you have "best_fit" or "best_performance" set in the policy of the raidset concerned.

Now that the failedset is clear and a new disk installed check the raid unit:
"show unit d10" and look for "lost_data" or "unwritable_date"

If the unit has lost or unwritable data then it will not be presented by the HSZ70 until this condition has been cleared.

If it is unwritable data then try this:
RETRY_ERRORS UNWRITEABLE_DATA D10
if this works then your data should be OK. You may have to wait for a little while for this to complete. If it dosn't work then use:
CLEAR_ERRORS D10 UNWRITEABLE_DATA
the problem with this is that you may loose data but if the retry command does not work then this is what you will have to do to make the unit presentable again.
NOTE: make sure if you have unwritable data you use the RETRY command first.

If the unit says lost data then this is the command you will need to use:
CLEAR_ERRORS D10 LOST_DATA
again the problem with this is that you may loose data. In both cases you will not know until you have cleared the condition & remounted the unit from the operating system and checked the file system.

If it's not one of these, or even if it is then the next thing to do is to check both controllers for an "invalid cache" condition. To do this use the:
show this
show other
commands and look near the bottom of the screen for invalid cache. You must check both controllers. If this is the problem then you will also need to clear it before any units are presented. My guess is that this is not the problem as you have not said that any other units have a problem.

The use of commands to clear an invalid cache is definately a potential loss of data so if you have this problem then repost for the command and a full explination.

Mark...
if you have nothing useful to say, say nothing...
sameerK
Frequent Advisor

Re: raidset failed state

Hello Mark
Thanks for your effort. Much appreciated. I had a very lil time to take a decision on the way forward.I took a call of reconfiguring the raidset restore the data and create the file system, etc.. as I was running short of time and the server was in live environment. I really wish if I had gotten these steps from you earlier.Ofcourse! it was sunday and I knew I amy not receive any response.
However the informationyou provide is much indepth and will be useful to me in future.
thnx for your help
cheers!!