BladeSystem - General
1820599 Members
1854 Online
109626 Solutions
New Discussion юеВ

Critical Error Redundancy Lost

 
rgol
New Member

Critical Error Redundancy Lost

Hello,

We just started to get a critical error complaining about redundancy lost in the power subsystem on a c7000 chassis. As far as we can tell all the power supplies are working fine.

I've attached a screenshot of what we can see in the OA.

Any ideas on what might be causing this problem?

Thanks!
33 REPLIES 33
JKytsi
Honored Contributor

Re: Critical Error Redundancy Lost

OA firmware, What is the version of it ?
Remember to give Kudos to answers! (click the KUDOS star)

You can find me from Twitter @JKytsi
rgol
New Member

Re: Critical Error Redundancy Lost

The OA firmware 2.30. We have 2 other c7000 chassis running the same firmware without any problems.
Raghuarch
Honored Contributor

Re: Critical Error Redundancy Lost

Can you execute show power command from the OA CLI Check the Power Allocation details.
Can you post syslog of the Active OA.
rgol
New Member

Re: Critical Error Redundancy Lost

Thanks for your responses guys. We finally contacted HP support and they said that the power is fine and the problem is with the OA administrator firmware. They recommended downgrading to 2.25.
Raghuarch
Honored Contributor

Re: Critical Error Redundancy Lost

rgol
New Member

Re: Critical Error Redundancy Lost

They said that 2.31 and 2.32 have the same problem.

Just because they recommend downgrading doesn't mean we will :-)

We will stay with 2.30 and live with this none-error.

Thanks again for your help.

morrisosu
Advisor

Re: Critical Error Redundancy Lost

We too are having the same issue. Just out of the blue we received this error, even though all appears to be functioning properly with the PSUs. We are on 2.32 as well.
weghn0
New Member

Re: Critical Error Redundancy Lost

We also have the same problem. (OA firmware 2.32.) I'm having a extra problem now, I can't boot a blade server error: Power Allocation Request Insufficient enclosure power
Patrick G.
Advisor

Re: Critical Error Redundancy Lost

I have the same error wit OA firmware 2.41. The attached picture shows the allocate power. Only the BL680c in Bay 6 is powered on.

OA 2.41
ILO2 1.70
ROM I17 09/23/2008

Any ideas?
Patrick G.
Advisor

Re: Critical Error Redundancy Lost

It looks like Bay 6 is the key. If it is the last blade I put into the enclosure, after all other blades are powered on without any failure, the bay 6 blade powered succesfully on. Only the OA got a message like "Redundancy lost" and "Power allocated: 44868 Watt DC".

Maybe it helps
Patrick G.
Advisor

Re: Critical Error Redundancy Lost

And bay 7 is responsible for that problem too.
weghn0
New Member

Re: Critical Error Redundancy Lost

We solved the "Power Allocation Request Insufficient enclosure power" error by rebooting the bays where the power allocationn was to high. We had more then one bay at about 3000watts after roboot under 200Watts.
hope this helps
Patrick G.
Advisor

Re: Critical Error Redundancy Lost

unfortunately not
darnoc
New Member

Re: Critical Error Redundancy Lost

Hi All,

HP is right, downgrading the OA firmware to 2.25 will solve the Critical Error Redundancy Lost. the OA FW 2.25 solves this issue.

The error also causes power reduction to the servers. Once you shutdown the server,it may not power back on due to less power, and red blinking LED you'll receive instead.

We've applied this solution just now as we experienced the problem this morning.


The Brit
Honored Contributor

Re: Critical Error Redundancy Lost

Is this issue being actively worked on at HP? and how long (how many versions) does it take to get a resolution.

I am currently at Version 2.32, and I have not seen the problem. I was going to upgrade, however I think I will wait since the problem has been seen to still exist with version 2.41. I dont want to risk activating a problem which currently seems to be dormant at my place.

Dave.
saks5th
Occasional Advisor

Re: Critical Error Redundancy Lost

Hi Everyone,

I just started having the same issue in one of our c-Class enclosures.

OA Firmware: 2.32
iLO2 Firmware: 1.70
ROM: I15 11/02/2008

We are running mostly BL460c. As previously noted, it does not look like any of the power supplies is the root cause of the issue... I would really hate reverting back to 2.25 unless I REALLY have to. Has anyone heard anything from HP?

Thanks.
Jesper
John Moorhead_2
Advisor

Re: Critical Error Redundancy Lost

Hi folks. We have a C7000 enclosure loaded with 2 VC 10gb OA blades in the rear, 2 BL680C-G5 in front Bay 1 & 2, 4 BL460C in Bay 3, 4, 11, 12, and BL220C-G5 in Bay 5, 6, 7, 8, 13, 14, 15, 16. Prior to setting up server instances, we performed a full firmware update; both OA blades are at version 2.32, and all server instances are at 1.70. All server blade instances except the BL460C in Bay 12 had been installed and operational running RedHat 4 Linux Nahant 7, kernel rev 2.6.9-78.0.13. For the last few days, we've had degraded status on the BL680C-G5 in Bay 2, showing the same power issues discussed. However, that server has been operational the entire time and performance has not been affected as far as we can tell. We are running continuous ASIC simulation jobs on it with no issues. Our original Bay 12 blade was bad; we just received the replaement. Upon insertion of the replacement, we got an error indicating insufficient power is available, and it refuses to power up at all. We are currently in "Not Redundant" power supply mode, but were previously at "Power Supply Redundant" when the issue first showed up. Our "Power Limit" is 60024 watts, with only 3886 watts ├в present power├в . The ├в enclosure total├в shows 13,500 watts.
saks5th
Occasional Advisor

Re: Critical Error Redundancy Lost

Hi again,

Just as an FYI, I just opened up an HP case regarding this issue. I'll let you know if/when I hear anything.

Thanks.
Jesper
John Moorhead_2
Advisor

Re: Critical Error Redundancy Lost

Update after placing call to HP Response Center:

1) ALL blades MUST be at latest firmware revision. Under Device Bays, click on + next to each numbered Bay, then under Information tab, see ROM Version. Compare this with latest firmware versions as seen on HP's web site.

2) Click on the image of each blade, or under Device Bays, click on each blade instance in turn. Compare figures for "Power Allocated". Look for one blade that shows significantly higher watts (for instance, I had one blade showing 43,850 watts as compared to between 250 to 500 watts for each of the others).

3) Remove that blade from the cabinet. Remove cover, and remove thin circular cache battery. Replace after one minute (make sure polarity stays the same!). Re-insert the blade.

What's going on is that power consumption values got corrupted during a prior firmware upgrade; cache needs to be cleared after the firmware upgrade to present default values at power-up.
Patrick G.
Advisor

Re: Critical Error Redundancy Lost

"...Remove that blade from the cabinet. Remove cover, and remove thin circular cache battery. Replace after one minute (make sure polarity stays the same!). Re-insert the blade. ..."

It doesn't help.
Joshua Oswald
New Member

Re: Critical Error Redundancy Lost

We have been ignoring these error messages for a couple months as well, until yesterday when it prevented a blade from powering on.

Per the instructions provided, we found one blade requesting 45kW. We removed the CMOS battery for a couple of minutes and that "solved" the problem... for now.
Joshua Oswald
New Member

Re: Critical Error Redundancy Lost

...and after a reboot, the blade is again asking for 44kW.
John Moorhead_2
Advisor

Re: Critical Error Redundancy Lost

Follow-up to my earlier post:

The procedure I outlined above did solve the problem for me, for 1.5 weeks. Then I started getting "Degraded status" again on the same blade.

Interestingly enough, this time my "Power Allocated" figure for this blade is right where it should be (541 watts) so the root cause is different this time; not the same issue; and none of the blades in the cabinet show abnormal readings. But note that since my first post and this one, I have NOT performed any firmware updates.

Pulling the blade out and re-seating it cleared the status for this second event. The IML log did not contain any info on this event. I also looked at the Enclosure Information/Enclosure Settings/Configuration Scripts/Current Inventory script, which contains a lot of good info (I highly recommend that you run this script and paste it into a file you can refer to later) including the results of "SHOW SYSLOG OA 1". That log indicates events a few days ago that I had missed:

Apr 5 23:17:42 OA: Blade 2 is reporting degraded health status.
Apr 5 23:17:42 OA: Blade in bay #2 status changed from OK to Degraded

Note the times; they are within the same second! Something about this blade is not happy.
Joshua Oswald
New Member

Re: Critical Error Redundancy Lost

This is anecdotal at this point, so take it for what it's worth.

We upgraded the firmware on the blade (BL680c) that was exhibiting the power issues to a version (2009.2.24) that was released just a few days ago and we did not experience the power errors. Immediately after we reverted back to the previous firmware (for other reasons), the power errors started again. I didn't notice anything in the release notes that indicated this specific bug was resolved.