ProLiant Servers (ML,DL,SL)
1825769 Members
2033 Online
109687 Solutions
New Discussion

HPE ProLiant DL380 Gen9 - Smart Array Controller P440 Crash

 
mrmcc71
Collector

HPE ProLiant DL380 Gen9 - Smart Array Controller P440 Crash

We recently had one of our servers crash due to what appears to be an issue with the Drive Controller. We received notifications from our monitoring software that our server was offline and the iLO system health was down. This caused us to take a look through iLO and saw the system was on a BSOD before quickly restarting itself. I was unable to document the BSOD error. Upon reboot the server came back up successfully and iLO then reported all components of the server were OK. Reviewing the IML we noticed a few POST messages but back in Windows Server, the Event viewer was roughly blank during this time.


Model: HPE ProLiant DL380 Gen9

iLO Firmware Version: 2.76 Oct 31 2020

System ROM: P89 v2.80 (10/16/2020)

Controller Model: Smart Array P440 Controller

Firmware Version: 7.00


The following were recorded in the IML during/after the crash. 

ID Severity Class Last Update Initial Update Count Description

117 Critical Network 10/06/2024 14:30 10/06/2024 14:30 1 Network Adapter Link Down (Slot 0, Port 4)

116 Caution POST Message 10/06/2024 20:29 10/06/2024 20:29 1 Option ROM POST Error: 1719-Slot 1 Drive Array - A controller failure event occurred prior to this power-up. (Previous lock up code = 0x13) Action: Install the latest controller firmware. If the problem persists, replace the controller.

115 Informational POST Message 10/06/2024 20:29 10/06/2024 20:29 1 POST Information: Processor 1, DIMM 1 could not be authenticated as genuine HPE Memory. Enhanced and extended HPE SmartMemory features will not be active.

114 Critical Drive Array 10/06/2024 14:04 10/06/2024 14:04 1 Drive Array Controller Failure (Slot 1)

 

Since the crash, the server has remained up and working just fine. Through iLO I see that all components are still showing as OK. While the first recommendation may be to upgrade the firmware on the controller, this doesn't seem like a solid solution as the server had been up roughly 330 days straight before this without any issue.


I just noticed that if I change the IML sorting from By ID to By Last update, more entries appear that appear to have been logged during this time as well. 

ID Severity Class Last Update Initial Update Count Description

116 Caution POST Message 10/06/2024 20:29 10/06/2024 20:29 1 Option ROM POST Error: 1719-Slot 1 Drive Array - A controller failure event occurred prior to this power-up. (Previous lock up code = 0x13) Action: Install the latest controller firmware. If the problem persists, replace the controller.

115 Informational POST Message 10/06/2024 20:29 10/06/2024 20:29 1 POST Information: Processor 1, DIMM 1 could not be authenticated as genuine HPE Memory. Enhanced and extended HPE SmartMemory features will not be active.

22 Informational POST Message 10/06/2024 20:29 [NOT SET] 31 POST Information: Processor 2, DIMM 12 could not be authenticated as genuine HPE Memory. Enhanced and extended HPE SmartMemory features will not be active.

21 Informational POST Message 10/06/2024 20:29 [NOT SET] 31 POST Information: Processor 2, DIMM 9 could not be authenticated as genuine HPE Memory. Enhanced and extended HPE SmartMemory features will not be active.

20 Informational POST Message 10/06/2024 20:29 [NOT SET] 31 POST Information: Processor 2, DIMM 4 could not be authenticated as genuine HPE Memory. Enhanced and extended HPE SmartMemory features will not be active.

19 Informational POST Message 10/06/2024 20:29 [NOT SET] 31 POST Information: Processor 2, DIMM 1 could not be authenticated as genuine HPE Memory. Enhanced and extended HPE SmartMemory features will not be active.

18 Informational POST Message 10/06/2024 20:29 [NOT SET] 31 POST Information: Processor 1, DIMM 12 could not be authenticated as genuine HPE Memory. Enhanced and extended HPE SmartMemory features will not be active.

17 Informational POST Message 10/06/2024 20:29 [NOT SET] 31 POST Information: Processor 1, DIMM 9 could not be authenticated as genuine HPE Memory. Enhanced and extended HPE SmartMemory features will not be active.

16 Informational POST Message 10/06/2024 20:29 [NOT SET] 31 POST Information: Processor 1, DIMM 4 could not be authenticated as genuine HPE Memory. Enhanced and extended HPE SmartMemory features will not be active.

117 Critical Network 10/06/2024 14:30 10/06/2024 14:30 1 Network Adapter Link Down (Slot 0, Port 4)

114 Critical Drive Array 10/06/2024 14:04 10/06/2024 14:04 1 Drive Array Controller Failure (Slot 1)


I had run a Diagnostic through the HPE SSA Windows Application and that diagnostic had come back fine. There were two drives that did have something under the "Last Failure Reason".
 
[ Top ] → [ Smart Array P440 in slot 1 ] → [ Internal Drive Cage at Port 1I : Box 1 ] → [ Physical Drive (2 TB SAS HDD) 1I:1:4 ] → Physical Drive Status 
Last Failure Reason Init Start Unit Failed (0x16)
 
 
[ Top ] → [ Smart Array P440 in slot 1 ] → [ Internal Drive Cage at Port 1I : Box 1 ] → [ Physical Drive (2 TB SAS HDD) 1I:1:9 ] → Physical Drive Status
Last Failure Reason Aborted Command (0x0e)

Are there any additional logs I am not seeing/checking to help troubleshoot what had happened? Like I stated before, I know the first thing recommended is to upgrade the firmware of the controller but it was running non-stop for almost a year straight, no issues. Is there additional logs for the Controller I could pull somehow? During my research I couldn't find much regarding the 0x13 Lockup Code other than alerts regarding much older firmware version.
 
Any help or information is appreciated, Thank you!
 
 
 
 
 
 
3 REPLIES 3
support_s
System Recommended

Query: HPE ProLiant DL380 Gen9 - Smart Array Controller P440 Crash

System recommended content:

1. HPE ProLiant DL380 Gen9 Server - Product Information Reference | Product features

 

Please click on "Thumbs Up/Kudo" icon to give a "Kudo".

 

Thank you for being a HPE valuable community member.


Accept or Kudo

Suman_1978
HPE Pro

Re: HPE ProLiant DL380 Gen9 - Smart Array Controller P440 Crash

Hi,

I see that your server is having new firmware levels, you can run the diagnostics from Intelligent Provisioning.

https://support.hpe.com/hpesc/public/docDisplay?docId=a00007325en_us
https://support.hpe.com/hpesc/public/docDisplay?docId=c05115986
https://support.hpe.com/hpesc/public/docDisplay?docId=a00060570en_us

Thank You!
I work with HPE but opinions expressed here are mine.



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
ngnear
HPE Pro

Re: HPE ProLiant DL380 Gen9 - Smart Array Controller P440 Crash

If the issue prevails after the firmware update, feel free to log a support case with us since we would require to diagnose some logs. 



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo