Apollo
1839268 Members
2777 Online
110137 Solutions
New Discussion

Re: HPE Apollo 4510 multiple errors

 
SOLVED
Go to solution
ServerHome-Max
Frequent Visitor

HPE Apollo 4510 multiple errors

Hi Everone,

was hoping you can help me. We have 2x HPE Apollo 4510 Gen 10 with a XL450 node. running 60x 16TB each. They are being used as back up devices for a Veeam cluster.

One of two systems shuts itself down after 24-48 hours. I first thought it was because of advisory: 

HPE ProLiant Gen 10 Servers- Single Power Supply Configurations of HPE Platinum 500W or 800W Supplies May Cause System Shutdown on HPE ProLiant Gen 10 Servers Under Heavy Work Load
https://support.hpe.com/hpesc/public/docDisplay?docId=a00050474en_us

 But we have swapped some other PSU's (from a G9) and tried different 800W Gen 10 PSU's. So now i have ordered 1600W PSU's to check if that solves it. 

Anyone a suggestion on where to look for different causes?

Max

PS error log criticals:
509,"critical","Environment","Apollo Chassis Controller unresponsive","06/25/2023, 6:45 PM","1","Firmware",
508,"critical","System Error","Server critical Fault (Service Information: Runtime Fault, System Board, AUX/Main EFUSE (10h))
500,"critical","Redfish Device Enablement","BatteryMissing (Slot=2) Redfish event from ‘/redfish/v1/Systems/1/Storage/DE081000/Controllers/0’","06/23/2023, 2:31 PM","2","Hardware",
496,"critical","Redfish Device Enablement","BatteryMissing (Slot=0) Redfish event from ‘/redfish/v1/Systems/1/Storage/DE07C000/Controllers/0’","06/23/2023, 2:31 PM","2","Hardware",
476,"critical","Drive Array","Slot 2 Smart Array - Cache module board backup power source status is failed","06/23/2023, 9:36 AM","1","Hardware",
473,"critical","Interlock","Improperly seated or missing device (Unknown interlock type, Riser 1)","06/23/2023, 9:21 AM","1","Hardware",
472,"critical","System Error","Server critical Fault (Service Information: Standby Fault, Flexible LOM, LOM 1 VRD (11h)) ","06/23/2023, 9:21 AM","1","Other",
465,"critical","UEFI","System Health Error. A critical system health error requires the system to be shutdown.","06/23/2023, 9:19 AM","2","Hardware",
464,"critical","Environment","Insufficient Fan Solution","06/23/2023, 9:19 AM","2","Cooling",
463,"critical","Environment","Fan Failure (Fan 10, Location System)","06/23/2023, 9:19 AM","2","Hardware",
462,"critical","Environment","Fan Failure (Fan 9, Location System)","06/23/2023, 9:19 AM","2","Hardware",
461,"critical","Environment","Fan Failure (Fan 8, Location System)","06/23/2023, 9:19 AM","2","Hardware",
460,"critical","Environment","Fan Failure (Fan 7, Location System)","06/23/2023, 9:19 AM","2","Hardware",
459,"critical","Environment","Fan Failure (Fan 6, Location System)","06/23/2023, 9:19 AM","2","Hardware",
458,"critical","Environment","Fan Failure (Fan 5, Location System)","06/23/2023, 9:19 AM","2","Hardware",
457,"critical","Environment","Fan Failure (Fan 4, Location System)","06/23/2023, 9:19 AM","2","Hardware",
456,"critical","Environment","Fan Failure (Fan 3, Location System)","06/23/2023, 9:19 AM","2","Hardware",
455,"critical","Environment","Fan Failure (Fan 2, Location System)","06/23/2023, 9:19 AM","2","Hardware",
454,"critical","Environment","Fan Failure (Fan 1, Location System)","06/23/2023, 9:19 AM","2","Hardware",
447,"critical","UEFI","System Health Error. A critical system health error requires the system to be shutdown.","06/22/2023, 12:02 PM","1","Hardware",
446,"critical","Environment","Insufficient Fan Solution","06/22/2023, 12:02 PM","1","Cooling",
445,"critical","Environment","Fan Failure (Fan 10, Location System)","06/22/2023, 12:02 PM","1","Hardware",
444,"critical","Environment","Fan Failure (Fan 9, Location System)","06/22/2023, 12:02 PM","1","Hardware",
443,"critical","Environment","Fan Failure (Fan 8, Location System)","06/22/2023, 12:02 PM","1","Hardware",
442,"critical","Environment","Fan Failure (Fan 7, Location System)","06/22/2023, 12:02 PM","1","Hardware",
441,"critical","Environment","Fan Failure (Fan 6, Location System)","06/22/2023, 12:02 PM","1","Hardware",
440,"critical","Environment","Fan Failure (Fan 5, Location System)","06/22/2023, 12:02 PM","1","Hardware",
439,"critical","Environment","Fan Failure (Fan 4, Location System)","06/22/2023, 12:02 PM","1","Hardware",
438,"critical","Environment","Fan Failure (Fan 3, Location System)","06/22/2023, 12:02 PM","1","Hardware",
437,"critical","Environment","Fan Failure (Fan 2, Location System)","06/22/2023, 12:02 PM","1","Hardware",
436,"critical","Environment","Fan Failure (Fan 1, Location System)","06/22/2023, 12:02 PM","1","Hardware",
429,"critical","UEFI","System Health Error. A critical system health error requires the system to be shutdown.","06/22/2023, 11:54 AM","2","Hardware",
428,"critical","Environment","Insufficient Fan Solution","06/22/2023, 11:54 AM","2","Cooling",
427,"critical","Environment","Fan Failure (Fan 10, Location System)","06/22/2023, 11:54 AM","2","Hardware",
426,"critical","Environment","Fan Failure (Fan 9, Location System)","06/22/2023, 11:54 AM","2","Hardware",
425,"critical","Environment","Fan Failure (Fan 8, Location System)","06/22/2023, 11:54 AM","2","Hardware",
424,"critical","Environment","Fan Failure (Fan 7, Location System)","06/22/2023, 11:54 AM","2","Hardware",
423,"critical","Environment","Fan Failure (Fan 6, Location System)","06/22/2023, 11:54 AM","2","Hardware",
422,"critical","Environment","Fan Failure (Fan 5, Location System)","06/22/2023, 11:54 AM","2","Hardware",
421,"critical","Environment","Fan Failure (Fan 4, Location System)","06/22/2023, 11:54 AM","2","Hardware",
420,"critical","Environment","Fan Failure (Fan 3, Location System)","06/22/2023, 11:54 AM","2","Hardware",
419,"critical","Environment","Fan Failure (Fan 2, Location System)","06/22/2023, 11:54 AM","2","Hardware",
418,"critical","Environment","Fan Failure (Fan 1, Location System)","06/22/2023, 11:54 AM","2","Hardware",
413,"critical","UEFI","System Health Error. A critical system health error requires the system to be shutdown.","06/21/2023, 11:35 AM","1","Hardware",
412,"critical","Environment","Insufficient Fan Solution","06/21/2023, 11:34 AM","1","Cooling",
407,"critical","OS","Automatic Operating System Shutdown Initiated Due to Fan Failure","06/21/2023, 11:15 AM","1","Other",
405,"critical","Environment","Fan Failure (Fan 10, Location System)","06/21/2023, 11:34 AM","2","Hardware",
404,"critical","Environment","Fan Failure (Fan 9, Location System)","06/21/2023, 11:34 AM","2","Hardware",
403,"critical","Environment","Fan Failure (Fan 8, Location System)","06/21/2023, 11:34 AM","2","Hardware",
402,"critical","Environment","Fan Failure (Fan 7, Location System)","06/21/2023, 11:34 AM","2","Hardware",
401,"critical","Environment","Fan Failure (Fan 6, Location System)","06/21/2023, 11:34 AM","2","Hardware",
400,"critical","Environment","Fan Failure (Fan 5, Location System)","06/21/2023, 11:34 AM","2","Hardware",
399,"critical","Environment","Fan Failure (Fan 4, Location System)","06/21/2023, 11:34 AM","2","Hardware",
398,"critical","Environment","Fan Failure (Fan 3, Location System)","06/21/2023, 11:34 AM","2","Hardware",
397,"critical","Environment","Fan Failure (Fan 2, Location System)","06/21/2023, 11:34 AM","2","Hardware",
396,"critical","Environment","Fan Failure (Fan 1, Location System)","06/21/2023, 11:34 AM","2","Hardware",
384,"critical","Drive Array","Storage system temperature status changed to failed for location Port 2I Box 1 connected to controller Slot 2.","06/17/2023, 6:32 AM","1","Hardware",
383,"critical","Drive Array","Storage system temperature status changed to failed for location Port 1I Box 1 connected to controller Slot 2.","06/17/2023, 6:32 AM","1","Hardware",
382,"critical","OS","Automatic Operating System Shutdown Initiated Due to Fan Failure","06/17/2023, 6:28 AM","1","Other",
380,"critical","Environment","Fan Failure (Fan 10, Location System)","06/17/2023, 6:28 AM","1","Hardware",
379,"critical","Environment","Fan Failure (Fan 9, Location System)","06/17/2023, 6:28 AM","1","Hardware",
378,"critical","Environment","Fan Failure (Fan 8, Location System)","06/17/2023, 6:28 AM","1","Hardware",
377,"critical","Environment","Fan Failure (Fan 7, Location System)","06/17/2023, 6:28 AM","1","Hardware",
376,"critical","Environment","Fan Failure (Fan 6, Location System)","06/17/2023, 6:28 AM","1","Hardware",
375,"critical","Environment","Fan Failure (Fan 5, Location System)","06/17/2023, 6:28 AM","1","Hardware",
374,"critical","Environment","Fan Failure (Fan 4, Location System)","06/17/2023, 6:28 AM","1","Hardware",
373,"critical","Environment","Fan Failure (Fan 3, Location System)","06/17/2023, 6:28 AM","1","Hardware",
372,"critical","Environment","Fan Failure (Fan 2, Location System)","06/17/2023, 6:28 AM","1","Hardware",
371,"critical","Environment","Fan Failure (Fan 1, Location System)","06/17/2023, 6:28 AM","1","Hardware",
358,"critical","System Error","Server critical Fault (Service Information: Power On Fault, System Board, P12V Main/AUX Regulators (04h)) ","06/15/2023, 12:31 PM","1","Other",
357,"critical","Environment","Apollo Chassis Controller unresponsive","06/14/2023, 10:07 PM","1","Firmware",
356,"critical","System Error","Server critical Fault (Service Information: Runtime Fault, System Board, AUX/Main EFUSE (10h)) ","06/14/2023, 7:20 PM","1","Other",
339,"critical","Environment","Apollo Chassis Controller unresponsive","[Not Set]","2","Firmware",
312,"critical","Environment","Apollo Chassis Controller unresponsive","06/13/2023, 12:30 PM","2","Firmware",
278,"critical","Environment","Apollo Chassis Controller unresponsive","06/07/2023, 5:45 PM","4","Firmware",

 

7 REPLIES 7
ServerHome-Max
Frequent Visitor

Re: HPE Apollo 4510 multiple errors

anyone a suggestion?

 

We have received the 1600W PSU's now and will install them later today to see if that fixes the issue.

Vinky_99
Esteemed Contributor

Re: HPE Apollo 4510 multiple errors

@ServerHome-Max 

If you have already ruled out the power supply as the cause of the shutdown issue by swapping PSU's, there could be other factors contributing to the problem. Here are a few suggestions on where to look for different causes:

* Ensure that the firmware and drivers for the HPE Apollo 4510 and XL450 node are up to date.

* Check the system event logs and error logs for any indications of errors or warnings that might help identify the cause of the shutdown. Look for any recurring patterns or specific error messages that might point to a particular component or software issue.

* Monitor the temperature and cooling system of the HPE Apollo 4510. Overheating can cause the system to shut down to protect the hardware. Make sure the server is adequately cooled, and all fans are functioning properly. Check for any obstructions to airflow or dust accumulation on the cooling components.

* Run hardware diagnostics tools provided by HPE to perform a comprehensive check of the server's components, including memory, CPU, and storage. These tools can help identify any faulty hardware that might be causing the shutdowns.

* Review the power management settings in the server's BIOS or UEFI firmware. Ensure that there are no settings causing the server to shut down automatically after a specific period of inactivity or under heavy workloads.

* Check the operating system and Veeam backup software configuration for any specific settings or configurations that might be causing the shutdown. Ensure that all the recommended settings and optimizations are in place for the backup environment.

I hope this help! Let me know...

These are my opinions so use it at your own risk.
Tam92
HPE Pro

Re: HPE Apollo 4510 multiple errors

Hello @ServerHome-Max,

 

Please let us know if the above suggestions helped to resolve the issue ?

 

Thanks,

TAM



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
ServerHome-Max
Frequent Visitor

Re: HPE Apollo 4510 multiple errors

Hi,

Yes we already tried those steps before i created the post and didnt work. we have added 1600W PSU's and it has been in operation now since friday afternoon without shutting down. So hopefully that will proof to be the cause.

Ill keep you guys updated!

Tam92
HPE Pro

Re: HPE Apollo 4510 multiple errors

Hello,

 

Any updates?

 

Thanks,

TAM



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
ServerHome-Max
Frequent Visitor
Solution

Re: HPE Apollo 4510 multiple errors

Hi Tam92,

So its been almost a week and so far no new errors and no sudden shut downs or such. So seems like it worked. But the strange thing is the 800W PSU's we tried didnt match the advisory information. So the advisory shouldnt be limited to 8J containing Serial tags and not to the specified PSU model. Others can also create the problem.

For now it seems like the problem has been solved!

Max

Sunitha_Mod
Honored Contributor

Re: HPE Apollo 4510 multiple errors

@ServerHome-Max 

Hello Max,

Perfect! 

We are glad to know the problem has been resolved and we appreciate you for keeping us updated.