Re: Critical Error Redundancy Lost

The Brit · ‎04-17-2009

Slight modification to my post above, I noticed that the 2.24 version referred to in earlier posts, was for BL680c blades. That version was released on April 2nd this year.

There was a ROM flash released for BL465c G5 blades dated April 3rd (version 2009.03.12) which I hope is the equivalent. The "Fixes" for this release states;

"Resolved an issue where the Low Power Halt State (AMD C1 Clock Ramping) option in the ROM Based Setup Utility (RBSU) does not properly disable this power management feature for AMD Opteron 2300-series processors. Previous revisions of the System ROM would not disable this functionality even when this option was configured for Disabled. This fix is not required for systems configured with older versions of AMD processors or if the Low Power Halt State was enabled (which is the default state)."

Resolved an extremely intermittent issue that could cause the system to encounter a system boot hang with a red screen.

Maybe one of these is the fix we are looking for although neither seems to reference it explicitly.

Dave.

John Moorhead_2 · ‎04-17-2009

My observations addressing Brit's questions:

1) In my case, no, the whacked-out "Power Allocated" figure on the one BL680c-G5 which caused our problem did not occur at power-up, it occured several days later. First indication was an amber alert on the blade with associated "Degraded Status". Once this issue occured and as long as it existed, any blades that had been powered down prior to the event could not be powered up (because the cabinet thinks at that point that there is insufficient power). Any blades that were running fine when the issue occured and were later powered off and then attempted to be powered back on would not go back on for the same reasons. As long as all blades in the cabinet had been running prior to the event, there has been no observed performance degredation or loss of server capabilities as long as they stay powered up. But you're playing with fire here; you are running in a crippled mode and run the risk that a critical production server could go down for some other reason and then you would not be able to bring it back up without powering off the root-cause blade as well.

2) In my experience, it did not actually affect operational performance of a running server at all, as long as that server stayed powered up. As in 1 above though, once the event occurs, you cannot power down another blade in the same cabinet and then power it back up without shutting down/upgrading firmware/resetting cache on the root-cause blade first.

3) My blades are all X86 based so I have no experience on Itanium.

4) I had upgraded my blades to 2.25 and did not see a re-occurence of exactly the same problem. However, with 2.25 I DID have a degraded status event on the BL680c-G5 1.5 weeks after the upgrade, but without the associated "Power Allocated" issue. For this reason I consider this to be a different event for different root causes, as-yet undetermined. Note that with this issue on 2.25, I was able to power-down a blade and back up again without any problems.

I've had very long discussions with the HP Support folks and I know that this issue has been getting a lot of scrutiny both at the Response Center and in the Labs.

Nikola Mrdja · ‎06-10-2009

There are newer OA fw versions (they appeared in end of May 2009), which should resolve those â status: degradedâ problems. The latest version is 2.51 (May 28 2009) and I'm about to put it in my Blade enclosure soon.

Does anyone tried this fw version already?

Patrick G. · ‎06-10-2009

In terms of my comment from Apr 16, 2009 08:40:05 GMT

"ROM Version "I17 02/24/2009" for "ProLiant BL680c G5" solved the problem in our case"

solved this ROM the power allocation issue but brings a new failure. Both ports of the HBA get the same WWPN. This ROM was removed by HP.
The power allocation issue occurs in only two BL680c. It doesn't matter in which enclosure they are plugged in.

The new ROM Version "I17 05/10/2009" solved in my case the power allocation issue and gives different WWPN to the HBA.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Critical Error Redundancy Lost