- Community Home
- >
- Servers and Operating Systems
- >
- HPE BladeSystem
- >
- BladeSystem - General
- >
- Re: Critical Error Redundancy Lost
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-10-2009 07:05 AM
04-10-2009 07:05 AM
Re: Critical Error Redundancy Lost
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-10-2009 07:13 AM
04-10-2009 07:13 AM
Re: Critical Error Redundancy Lost
I just wanted to give an update to this issue. I opened up a case with HP and basically the response was that you need to install the latest and greatest firmware on your server, iLO2 and OA. In our case we already had the latest installed (BL460c G1; ROM at 11/02/2008, iLO2 at 1.70) with the exception of the OA, which was sitting at 2.32. After updating to the latest OA version (2.41) we were still experiencing the power allocation problems but after rebooting all of the individual blade servers the error went away and we have not seen it re-occur. Hope this helps...
Jesper
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-16-2009 12:40 AM
04-16-2009 12:40 AM
Re: Critical Error Redundancy Lost
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-16-2009 03:19 AM
04-16-2009 03:19 AM
Re: Critical Error Redundancy Lost
Unfortunately, for us, that has brought a new host of problems.
Apparently, this ROM enumerates physical devices to the OS differently, which means our HP Teaming definitions are no longer valid. I would suspect (although haven't tested) that removing / reinstalling the NCU would resolve this except our blades are running server Core... which doesn't have a mechanism for uninstalling the NCU. The only option is to rebuild the server.
So, since these blades are not yet in production, we bit the bullet and upgraded the ROM and re-installed the OS. Then we discovered that this ROM also breaks dynamic WWN assignment for HBAs... both HBA ports get assigned the same WWN. If we revert back to the previous ROM, the WWNs get assigned correctly, but we (again) have invalid Teams and we're back to the (original) power redundancy error.
I have a case open with HP regarding the WWN assignment problem. The last suggestion was to revert our VC firmware version and recreate the domain.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-17-2009 05:06 AM
04-17-2009 05:06 AM
Re: Critical Error Redundancy Lost
Just to clear up a few things, and maybe focus the discussion. The real question is whether the problem is between the OA firmware reading/getting the wrong information from the Power Supplies, or whether the interaction between the Blade and the power supply at power up is causing the PS to report to the OA incorrectly.
1. Does the initial occurance of the problem ALWAYs occur when a blade is being powered-up??
2. Has the problem ever affected a RUNNING server, i.e. caused a crash or shutdown??
3. Have there been any cases where this issue is FIRST SEEN when powering up an Itanium Blade?
4. It is being implied that Proliant ROM F/W 2.24 fixes the problem, so has any Proliant running 2.24 exhibited the problem.
Also, although the emphasis seems to be moving away from OA F/W, and not withstanding that HP is still recommending a downgrade to 2.25 for those experiencing the problem, I am surprised to see in the most recent "Alerts" e-mail, recommendations to upgrade to OA Version >2.25 for what appear to be fairly trivial issues which can be manually resolved via the OA CLI.
Since the cause of the PS issue is not fully resolved, it seems a little reckless to be making this recommendation and risk invoking a much more serious problem (in my opinion).
Makes me wonder if anyone at HP is paying attention to the problems that this PS issue is causing. I am running OA 2.32, and I have not experienced the problem at my site, however I have to admit to a certain level of paranoia, making me fearful of performing many simple functions.
1. I am concerned about power cycling any of my servers since that may initiate the problem in my enclosures.
2. I am holding off with my intended upgrade of the OA firmware (to 2.41) because the problem has also been reported with this FW level, and again, I don't want to risk initiating the problem at my site.
Some official (HP) statement or comment, or reassurance would be appropriate at this point.
Dave.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-17-2009 06:47 AM
04-17-2009 06:47 AM
Re: Critical Error Redundancy Lost
There was a ROM flash released for BL465c G5 blades dated April 3rd (version 2009.03.12) which I hope is the equivalent. The "Fixes" for this release states;
"Resolved an issue where the Low Power Halt State (AMD C1 Clock Ramping) option in the ROM Based Setup Utility (RBSU) does not properly disable this power management feature for AMD Opteron 2300-series processors. Previous revisions of the System ROM would not disable this functionality even when this option was configured for Disabled. This fix is not required for systems configured with older versions of AMD processors or if the Low Power Halt State was enabled (which is the default state)."
Resolved an extremely intermittent issue that could cause the system to encounter a system boot hang with a red screen.
Maybe one of these is the fix we are looking for although neither seems to reference it explicitly.
Dave.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-17-2009 08:52 AM
04-17-2009 08:52 AM
Re: Critical Error Redundancy Lost
1) In my case, no, the whacked-out "Power Allocated" figure on the one BL680c-G5 which caused our problem did not occur at power-up, it occured several days later. First indication was an amber alert on the blade with associated "Degraded Status". Once this issue occured and as long as it existed, any blades that had been powered down prior to the event could not be powered up (because the cabinet thinks at that point that there is insufficient power). Any blades that were running fine when the issue occured and were later powered off and then attempted to be powered back on would not go back on for the same reasons. As long as all blades in the cabinet had been running prior to the event, there has been no observed performance degredation or loss of server capabilities as long as they stay powered up. But you're playing with fire here; you are running in a crippled mode and run the risk that a critical production server could go down for some other reason and then you would not be able to bring it back up without powering off the root-cause blade as well.
2) In my experience, it did not actually affect operational performance of a running server at all, as long as that server stayed powered up. As in 1 above though, once the event occurs, you cannot power down another blade in the same cabinet and then power it back up without shutting down/upgrading firmware/resetting cache on the root-cause blade first.
3) My blades are all X86 based so I have no experience on Itanium.
4) I had upgraded my blades to 2.25 and did not see a re-occurence of exactly the same problem. However, with 2.25 I DID have a degraded status event on the BL680c-G5 1.5 weeks after the upgrade, but without the associated "Power Allocated" issue. For this reason I consider this to be a different event for different root causes, as-yet undetermined. Note that with this issue on 2.25, I was able to power-down a blade and back up again without any problems.
I've had very long discussions with the HP Support folks and I know that this issue has been getting a lot of scrutiny both at the Response Center and in the Labs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-10-2009 02:51 AM
06-10-2009 02:51 AM
Re: Critical Error Redundancy Lost
Does anyone tried this fw version already?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-10-2009 05:09 AM
06-10-2009 05:09 AM
Re: Critical Error Redundancy Lost
"ROM Version "I17 02/24/2009" for "ProLiant BL680c G5" solved the problem in our case"
solved this ROM the power allocation issue but brings a new failure. Both ports of the HBA get the same WWPN. This ROM was removed by HP.
The power allocation issue occurs in only two BL680c. It doesn't matter in which enclosure they are plugged in.
The new ROM Version "I17 05/10/2009" solved in my case the power allocation issue and gives different WWPN to the HBA.
- « Previous
-
- 1
- 2
- Next »