Replace disk in array 12H - Urgent.

Eugeny Brychkov
Honored Contributor

Re: Replace disk in array 12H - Urgent.

One little note: when rebuild will complete you can try to remove and insert this disk back which is in failed state now. Failure condition will be cleared, and balancing will start. If disk is really fauly, balancing will be cancelled and disk will be faulted again. Then surely replace this disk.
Why? There were cases when autoraid was failing disk, but disk was not really bad. Only logs will show it (logprint -t All -v)
Robb Bailey
Occasional Advisor

Re: Replace disk in array 12H - Urgent.

The rebuild has completed and here is an updated arraydsp output.

I made a mistake earlier, there are 8 18.2 disks in the array.

My assumptions:

Used as Active Hot spare = 0 b/c the spare disk took over for B4 (failed)

I have an additional question, though. My machine rebooted when the disk failed. Is that related to the prior "Unallocated space is too small for active hot spare."? I thought that with hot spare enabled the array manager would hold back a drive as the hot spare. I guess I don't know why the reboot. I would have thought the hot spare would have kicked in w/o missing a beat.

Again, thank you for your response and I will assign points once things slow a bit and the machine is recovered.

A. Clay Stephenson
Acclaimed Contributor

Re: Replace disk in array 12H - Urgent.

The 'Active Hot Spare' of an AutoRAID is more complicated than that. The spare does not do a one for one (usually) replacement and is used to increase the amount of RAID 1/0 space. Once there was a failure there was not enough unallocated space to allow for the additional failure of the largest drive remaining.

The machine should not have rebooted or hung. In fact, the failure should have been all but invisible to the OS other than the array monitoring daemon.

S.K. Chan
Honored Contributor

Re: Replace disk in array 12H - Urgent.

I would like to see the output of arraydsp -a after the replacement of the disk in B4. It's basically still showing the same information except for the status of the rebuild because it has completed its rebuild process. Even without a hotspare on a 12H, there should be enough redundancy to take care of ONE faulty disk. The 12H will allocate that amount of required space in "used for redundancy" automatically. Hence failure of one disk should not cause your system to reboot. Check the system logs, GSP logs, etc for clues. BTW hot spare allocating is not tie to one particular disk. In a 12H you can't tell which disk is used as hotspare. The internal machanism of this array may allocated spaces from all disks to add up to the size of the largest disk in the group representing the hotspare.

At this point all you need to do is plug in the new disk once you got it and check the status of the array periodically to make sure everything is back to normal.