ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

RAID problem - one disk shows predective failure and the hot spare stays rebuilding eternally

 
Corti
Occasional Visitor

RAID problem - one disk shows predective failure and the hot spare stays rebuilding eternally

hi,

recently I got an alert of one disk being marked as Predictive Failure, I thought that the hot spare would take over but it stays rebuilding for ever.

 

the server is getting stuck twice a day and I am forced to restart it manually.

Server info:

Server Type       : "ProLiant DL380 G6"

Raid Controller : HP Smart Array P410i Controller

Controller Firmware Version     : 2.50

 

 

I see this events in windows 2003 :

 

Spoiler
Power-On-Self-Test (POST) errors occurred during the last system startup.
 
User Action
Check the Power-On-Self-Test (POST) errors, and take corrective action as needed.
 
WBEM Indication Properties
AlertingElementFormat: 0 0 (Unknown)
AlertType: 5 0x5 (Device Alert)
Description: "Power-On-Self-Test (POST) errors occurred during the last system startup."
EventCategory: 4 0x4 (System Hardware)
EventID: "1"
EventTime: "20131015175126.346000+000"
ImpactedDomain: 4 0x4 (System)
IndicationIdentifier: "{33ACC83B-F598-4BAF-B5A4-8C358799C2E0}"
IndicationTime: "20131015145114.859000-180"
NetworkAddresses[0]: "172.18.4.5"
OSType: 69 0x45 (Microsoft Windows Server 2003)
OSVersion: "5.2.3790"
PerceivedSeverity: 5 0x5 (Major)
ProbableCause: 8 0x8 (Configuration/Customization Error)
ProbableCauseDescription: "POST Errors Occurred"
ProviderName: "HP POST"
ProviderVersion: "2.3.0.0"
RecommendedActions[0]: "Check the Power-On-Self-Test (POST) errors, and take corrective action as needed."
Summary: "POST errors occurred"
SystemCreationClassName: "HP_WinComputerSystem"
SystemFirmwareVersion[0]: "2009.10.01"
SystemFirmwareVersion[1]: "2009.10.01"
SystemGUID: "33343934-3932-4247-3830-303448444542"
SystemModel: "ProLiant DL380 G6"
SystemName: "DELPRYASU-FS004.DELPRYASU.delegations.cec.eu.int"
SystemProductID: "494329-B21"
SystemSerialNumber: "GB8004HDEB"
TIME_CREATED: 130263330863468280 0x1cec9cf2b604ef8
VariableNames[0]: "POST Error Code"
VariableNames[1]: "POST Error String"
VariableTypes[0]: 3 0x3 (uint8)
VariableTypes[1]: 1 0x1 (string)
VariableValues[0]: "14"
VariableValues[1]: "POST Error: 301-Keyboard Error"

 

Spoiler
Power-On-Self-Test (POST) errors occurred during the last system startup.
 
User Action
Check the Power-On-Self-Test (POST) errors, and take corrective action as needed.
 
WBEM Indication Properties
AlertingElementFormat: 0 0 (Unknown)
AlertType: 5 0x5 (Device Alert)
Description: "Power-On-Self-Test (POST) errors occurred during the last system startup."
EventCategory: 4 0x4 (System Hardware)
EventID: "1"
EventTime: "20131015175126.649000+000"
ImpactedDomain: 4 0x4 (System)
IndicationIdentifier: "{FD4D333F-7981-4A25-95F7-BD8A1D077BBC}"
IndicationTime: "20131015145126.346000-180"
NetworkAddresses[0]: "172.18.4.5"
OSType: 69 0x45 (Microsoft Windows Server 2003)
OSVersion: "5.2.3790"
PerceivedSeverity: 5 0x5 (Major)
ProbableCause: 8 0x8 (Configuration/Customization Error)
ProbableCauseDescription: "POST Errors Occurred"
ProviderName: "HP POST"
ProviderVersion: "2.3.0.0"
RecommendedActions[0]: "Check the Power-On-Self-Test (POST) errors, and take corrective action as needed."
Summary: "POST errors occurred"
SystemCreationClassName: "HP_WinComputerSystem"
SystemFirmwareVersion[0]: "2009.10.01"
SystemFirmwareVersion[1]: "2009.10.01"
SystemGUID: "33343934-3932-4247-3830-303448444542"
SystemModel: "ProLiant DL380 G6"
SystemName: "DELPRYASU-FS004.DELPRYASU.delegations.cec.eu.int"
SystemProductID: "494329-B21"
SystemSerialNumber: "GB8004HDEB"
TIME_CREATED: 130263330866491380 0x1cec9cf2b8e6ff4
VariableNames[0]: "POST Error Code"
VariableNames[1]: "POST Error String"
VariableTypes[0]: 3 0x3 (uint8)
VariableTypes[1]: 1 0x1 (string)
VariableValues[0]: "92"
VariableValues[1]: "POST Error: 1792-Drive Array Reports Valid Data Found in Array Accelerator"
Spoiler
Power-On-Self-Test (POST) errors occurred during the last system startup.
 
User Action
Check the Power-On-Self-Test (POST) errors, and take corrective action as needed.
 
WBEM Indication Properties
AlertingElementFormat: 0 0 (Unknown)
AlertType: 5 0x5 (Device Alert)
Description: "Power-On-Self-Test (POST) errors occurred during the last system startup."
EventCategory: 4 0x4 (System Hardware)
EventID: "1"
EventTime: "20131015175126.936000+000"
ImpactedDomain: 4 0x4 (System)
IndicationIdentifier: "{47E606F8-AFC1-4465-90DD-60998ADC054A}"
IndicationTime: "20131015145126.649000-180"
NetworkAddresses[0]: "172.18.4.5"
OSType: 69 0x45 (Microsoft Windows Server 2003)
OSVersion: "5.2.3790"
PerceivedSeverity: 5 0x5 (Major)
ProbableCause: 8 0x8 (Configuration/Customization Error)
ProbableCauseDescription: "POST Errors Occurred"
ProviderName: "HP POST"
ProviderVersion: "2.3.0.0"
RecommendedActions[0]: "Check the Power-On-Self-Test (POST) errors, and take corrective action as needed."
Summary: "POST errors occurred"
SystemCreationClassName: "HP_WinComputerSystem"
SystemFirmwareVersion[0]: "2009.10.01"
SystemFirmwareVersion[1]: "2009.10.01"
SystemGUID: "33343934-3932-4247-3830-303448444542"
SystemModel: "ProLiant DL380 G6"
SystemName: "DELPRYASU-FS004.DELPRYASU.delegations.cec.eu.int"
SystemProductID: "494329-B21"
SystemSerialNumber: "GB8004HDEB"
TIME_CREATED: 130263330869363325 0x1cec9cf2bba427d
VariableNames[0]: "POST Error Code"
VariableNames[1]: "POST Error String"
VariableTypes[0]: 3 0x3 (uint8)
VariableTypes[1]: 1 0x1 (string)
VariableValues[0]: "90"
VariableValues[1]: "POST Error: 1778-Drive Array Resuming Automatic Data Recovery Process"

 

I dont really know what to do, I thought that maybe removing the hard disk is showing the predective failure, the RAID will be force to finish the rebuilding process... :?

 

any suggestion?

 

Thanks a lot in advance.

Jose

 

update

-----

I just connected to the management page and check the RAID... it is weird, phisically all the disks are green but I have a precation sign in the logical part. once I click on the disk 4, I clearly see the errors... should I just remove the disk then?

2 REPLIES
Robert_Jewell
Honored Contributor

Re: RAID problem - one disk shows predective failure and the hot spare stays rebuilding eternally

Since you have reset the system a couple of times already, I dont see the harm in removing the pred. failed drive at this point.  However, I would have a replacement on hand.   It seems like the spare is having trouble rebuilding, so by removing the bad drive the rebuild should attempt using the spare.   Adding the new drive may allow you to add another spare that can be used if the existing spare is also bad.

 

-Bob

----------------
Was this helpful? Like this post by giving me a thumbs up below!
dintid
Occasional Collector

Re: RAID problem - one disk shows predective failure and the hot spare stays rebuilding eternally

Since we are dealing with Raid5 I would definetly do a disc2vhd first.

 

I've seen it so often that raid5 fails after X years and since all discs are old, odds are that one more fails as well when a full parity rebuild is forced on the array.

Hot spare problems might actually mean that one more disc is having issues.

 

I'd alwas go with raid 10 or raid 6, but that's another discussion :)