BladeSystem - General

Re: C7000 / BL460c Gen8 System Power Fault Detected - Inserion and/or Removal of Blades

 
sav2880
Collector

C7000 / BL460c Gen8 System Power Fault Detected - Inserion and/or Removal of Blades

I'm running into a very very strange issue that I'm curious if anyone else has seen. In our data center, when we have a BL460c Gen 8 blade inserted or removed from our C7000 BladeChassis to move, we're seeing 2 and sometimes 3 other blades power off from a critical power event. 

Resetting the eFuse on the blades always brings it back up, but needless to say it's causing us quite a bit of anguish whenever this happens and hampering our ability to move things around. 

The IML logs events always look like this: 

Power 07/28/2021 11:02 07/28/2021 11:02 1

System Power Fault Detected (XR: 10 20 MID: FF CD FC D6 03 13 13 AA 00 00 00 EE 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00)

... or something very similar to it. 

So far, I've tried to switch the power supply redundancy model from AC Redundant (3 active, 3 passive) to Power Supply Redundant (5 active, 1 passive) as power capacity was very close to max on the AC Redundant mode. This didn't seem to help at all. 

iLO on the Blades is 2.77, they're all ILO4. BIOS is somewhat old, 2015 revision. OA's verison is 4.95

CPU on the blades is a Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz. I know there was a advisory for a different CPU in the family, but this CPU does not match, and the BIOS, while not new, is new enough to not have this particular problem. 

Has anyone run into this before, and what was your solution? I'm working remote so don't have direct physical access to these blades, but hopefully I can translate any idea to something we can have remote hands do. 

Thanks!

1 REPLY 1
SanjeevGoyal
HPE Pro

Re: C7000 / BL460c Gen8 System Power Fault Detected - Inserion and/or Removal of Blades

Hello,

 

Please follow the below steps.

1.The resolution was to manually E-Fuse the blades.
2.Reset the blade, if the issue persists.
2.Update the server with the latest and firmware.
3.Do the below RBSU settings.

CQHPCC Collaborative_Power_Control Enabled
CQHPWR HP_Power_Profile Maximum_Performance
CQHPER HP_Power_Regulator HP_Static_High_Performance_Mode
CQHPCKG Intel_Minimum_Processor_Idle_Power_Package_State No_Package_State
CQHPER Intel_Minimum_Processor_Idle_Power_State No_C-States

4.Replace the below part if the issue persists after performing the above steps.

RESOLUTION
The retention force of the latch in the server blade allows for the possibility of slight horizontal movement on the servers in response to the insertion of another blade in the c7000 enclosure, which can cause a momentary disconnect with a signal in the enclosure indicating a blade is installed. This momentary disconnect will cause an immediate shutdown of the server and logging of the IML power fault event. The latches on all BL460c Gen8 server blades in the c7000 enclosure should be replaced with a new version of the latch, spares part number 688895-001.


Part Number :688895-001
Part Description :Blade release lever kit - Includes server release lever assembly, server release lever bracket, and T-10 screws (4)

If you feel this was helpful please click the KUDOS! thumb below and accept the solution.
Regards,


I am a HPE Employee

Accept or Kudo