ProLiant Servers (ML,DL,SL)
1752790 Members
5949 Online
108789 Solutions
New Discussion юеВ

Array with Online Spare Issue DL380 G4

 
SOLVED
Go to solution
Dan Straka
New Member

Array with Online Spare Issue DL380 G4

This server has 5 of the drive bays populated with disks. 0, 2, 3 & 4 make up the logical drive with 1 as a "spare". Drive 5 has failed (flashing red X light) and there is no activity indicated on the spare so apparently that did not work as expected.
Can I hot-swap the bad drive anyway with a new one or will this cause the array to fail?
6 REPLIES 6
Michal Kapalka (mikap)
Honored Contributor

Re: Array with Online Spare Issue DL380 G4

hi,

you are saing that after disk failure the spare wasn't activated, check it with the ACU,
if you have installed the HP.

if you replace the failed drive, with a new one, the array will be ok and it should start rebuild the array to the full functionality, it could be happen that this rebuilding will have some impact to the performance because of higher I/O operation.

hopefully, you will not remove the spare disk,

i think you have raid1/raid5 configuration.

we have lot of this servers, that are 90% in the production, but we are using only 2 disk in RAID1 configuration ( only for the system ), all data are attached form SAN.


mikap
Dan Straka
New Member

Re: Array with Online Spare Issue DL380 G4

The online spare did not activate and I have not removed it.
This is RAID 5 configuration only (simplex cabling), please see attached screen capture for the configuration "physical view".
I'm still wary of removing the failed drive and replacing it, are you sure that will work?
Michal Kapalka (mikap)
Honored Contributor

Re: Array with Online Spare Issue DL380 G4

Hi,

The online spare did not activate and I have not removed it.
This is RAID 5 configuration only (simplex cabling), please see attached screen capture for the configuration "physical view".
I'm still wary of removing the failed drive and replacing it, are you sure that will work?

In my opinion, its non HW raid hot swappable, but if its a production box, you shouldn't trust nobody, if its possible, plan a downtime, and replace the failed drive when the server is without power.


check this web site :

http://h18000.www1.hp.com/products/servers/proliantstorage/arraycontrollers/smartarray6i/index.html

==> Ideal environment

after reading this i would recomend to replace the drive offline, but before try to reboot the OS and start it and check if the error is still there with the failed drive.

mikap

PS : i know, maybe its not a god advice from me, but i know your situation, what does it mean, work with the risk of System failure.
kris rombauts
Honored Contributor
Solution

Re: Array with Online Spare Issue DL380 G4

Hi Dan,

i am not familiar anymore with the logging features of Novell Netware but when a disk is flashing red, it might be just because it crossed the threshold of errors he is allowed to have in its lifetime and it then send a SMART trip at that time. This does not fail the drive, i have such a drive running since 2 years and it is still part of the array and the array is still redundant. When i boot i see a message from the SmartArray controller that it has a disk with a SMART error and the advise is to replace that disk. What this means is a indication that the drive is likely to fail in the (near) future because it crossed it's threshold defined in the specifications of the disk manufacturer. If you have a management station available (i.e. HP SIM) and all is configured correctly to receive SNMP traps then you should have received this SMART trip SNMP trap some time ago (when the disk reached it's error threshold).

What surprises me is that disk 5 is no longer visible in your screenshot of the online SmartArray config utility on the console. If you unload and reload the NLM, does it make a difference ?

Did you ever rebooted since this issue was seen ?

If you perform disk access, do you still see the disk 5 being accessed ? (green led blinking ?)


So IF it is due to a SMART trip, then it is normal that the hotspare did not kick in yet, what you can do to prove this theory is to run the ADU (Array diagnostic report) if that exits for Netware (i could not find it) and post it in the forum again.

Then what you can do is remove the drive 5 and replace it with a new disk. When you remove the disk 5, the hotspare will kick in after a short while (30 seconds ?) and the rebuild onto the hotspare should start automatically.



HTH

Kris
Dan Straka
New Member

Re: Array with Online Spare Issue DL380 G4

Hey Kris,

After reading your very informative post I am confident the drive has received a predective failure alert or as you put it "crossed the threshold of errors" and has not yet completely failed. There is no disk in slot 5 so that's why it's not in the screen shot. I'll just let it go as the one you have done so or pull the drive because I have a spare now I can replace the drive with anytime.

Thanks so much for your input it has proved to be invaluable to me.

Dan
kris rombauts
Honored Contributor

Re: Array with Online Spare Issue DL380 G4

Dan,

ok now i see where my confusion came from , it's a 4 disk raid5 with one hotspare so the screenshot shows all 5 disks, i was confused with the name of disk5 you referred to and i did not saw that ID in the screenshot.

So now i am even more convinced the SMART trip (pre-failre alert) is exactly what you have encountered here.


Kris