ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

ProLiant DL380 Gen9 with SmartArray H240ar not flagging HDs as defective even with critical failure

 
Highlighted
Regular Advisor

ProLiant DL380 Gen9 with SmartArray H240ar not flagging HDs as defective even with critical failure

Hi everyone,

My issue is related to a possible defect with H240ar (latest firmware of course) which manifests itself by H240ar not flagging a drive as failed even when the drive has surface defects and logs critical drive failure error in AHS.

Obviously the failure is real as support immediately opens an RMA and replaces the drive. However the question of "why didn't H240ar flag is as bad" remains open and support isn't willing to address it. There's underlying causality here left uncovered and I'm not being guided though the escalation process which hopefully leads to either a coherent answer or a product design defect analysis internal to HPE.

This isn't one of those anecdotal issues, it is quite real and I am stumped as to why or how the design of this controller is different from the earlier versions which I've used for decades.

AHS> Active Health Event>  Critical,3094,21758,Smart Array,Critical System Event, ,0x00,12/21/2019 05:49:33,Event Code: 48, [2019-12-20 21:49:17] Fatal drive error, Port=1I Box=3 Bay=2

With the above failure logged into AHS the controller still showed the drive as OK and there were no alarms. I've had drives go bad for decades on pre-Gen9 HPE servers and every time the controller will flag a drive as bad even using predictive failure status which is less stringent than fatal failure.

What happened to Gen9? I find this pretty odd? The issue for me is that since the controller doesn't flag and remove the drive from the array what happens next is that the failure condition is passed onto the OS. Guess what happens there - unpredictable behavior due to disk errors, OS lockup, BSOD.

Wasn't that exactly what HPE promised to protect us from by having SmartArray and all the elements of a redundant disk system? Obscure all disk failures by adding the SmartArray layer which manages the underlying hardware biz?

If an HPE staff stumbles upon this I would very much need your guidance and help in escalating my case (will be happy to IM the case #)

Thank you

~B

 

1 REPLY 1
Highlighted
HPE Pro

Re: ProLiant DL380 Gen9 with SmartArray H240ar not flagging HDs as defective even with critical fail

Hi, 

1. As per what has been mentioned, I believe the online SSA (in the OS) does not show up the failures.

And you do not receive any SNMP trap emails when a drive has failed. 

Please correct me if i am wrong. 

2. What is the OS installed in this server?

Regards,

Shruthi

I am an HPE employee
Accept or Kudo