BladeSystem - General
cancel
Showing results for 
Search instead for 
Did you mean: 

N+N Power redundancy issue?

chuckk281
Trusted Contributor

N+N Power redundancy issue?

Eric is working a customer question:

 

************

 

No one has given an answer about how C7000 enclosure manage the fact that the power allocated to all blades is insufficient ?

Does the OA shutdown the complete enclosure ?

Does the OA shutdown some blade by using a random algorithm to choose these blades ?

Does the enclosure uses power supply as dedicated to zones (has described by Thao) : PSU 1 and 4 for zone 1, PSU 2 and 5 to add zone 2 and PSU 3 and 6 to add the 8 additional blades ?

 

“No one within HP know how your own product is working ?” is the feedback to my customer as I’m not able to  give a definitive answer.

 

Currently my response is to explain that when you reach this situation it means that you encounter a double failure (at least 2 PSU failure and it is not a normal condition) but something has been developed in OA to manage this kind of situation… no ?

 

*************

 

Reply from Dan:

 

**************

 

The only time a blade will be shut down is when there is insufficient power to run all the blades load.

In this scenario a brown-out will occur on ALL blades and likely ALL blades will go down.

Since your customer had only blade 9 go down, it is most likely human action that caused this.

Button press is not likely logged by AHS or iLO Log.  Check inside the Server logs (since Linux, syslog)

 

Once the blade 9 was powered down, because Power Supply was offline, there was insufficient power ALLOCATION which is what is used to determine if it is safe to power on a blade.

Power ALLOCATION is based on worst case power usage scenario and is thus very conservative.

If you have Redundancy mode set to AC Redundant (N+N), and you lose 1 PSU, you now have 2 x PSU Size for Power Allocation because OA cannot guarantee 3 x PSU Size any longer.

If you are already over the 2xPSU allocation, no blades that are inserted or off may be powered on, for safety.

 

Once PSU is fixed, OA determines that you now have 3 x PSU Size available because you are back to 3+3 mode.

Now there is enough Power Allocation room to power on Blade 9.

 

 

Additional way to fix this problem on the fly with the customer.

Drop Power Redundancy mode to PSU Redundant (N+1) during failure of a single PSU.

The system now gives you 4 x PSU size because you have 5 working PSUs (5 minus 1 for PSU redundancy mode = 4)

Once Blade 9 is powered back on, you can change Redundancy mode back to AC Redundant.

Note there is some risk here as I have not tried this process myself and I do not know if HP Support would advise this method or not.

 

 

Just remember these rules:

  • Power Usage is NOT the same as Power Allocation.
  • Usage is real work and draw from the wall.
  • Allocation is theoretical max if all devices in c7000 went to 100% load at the same time.
  • Power DOWN of servers only happens when USAGE exceeds available power. 
    • Very rare and causes brown out condition across entire c7000
  • Power UP of servers is based on available ALLOCATION
    • Becoming more and more common to have problems with “Not enough power” messages since each BL460c can have allocation at 500W depending on configuration.
    • 2650W PSUs designed to give extra room for 16x500W configs
      • Check FAN rules for OA version.  You might be able to pull 1 or 2 fans to get enough room for blade 16 and then put fans back as long as 8 fan rule supports 16 blades.

 

 

 

PS: This process of Power Allocation, PSU redundancy and what happens when I lose a PSU has been discussed on these PDLs MANY times.  Is there a better way for us to document this information so that HP staff and Customers would understand?

Personally I want a cartoon on Youtube with voice over by Monty on Blade Engineering team.  I can hear his frustration in my head now.

 

****************

 

Comments?