- Community Home
- >
- Servers and Operating Systems
- >
- HPE BladeSystem
- >
- BladeSystem - General
- >
- Re: Critical Error Redundancy Lost
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-30-2009 12:46 AM
тАО03-30-2009 12:46 AM
Re: Critical Error Redundancy Lost
It doesn't help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-08-2009 04:13 AM
тАО04-08-2009 04:13 AM
Re: Critical Error Redundancy Lost
Per the instructions provided, we found one blade requesting 45kW. We removed the CMOS battery for a couple of minutes and that "solved" the problem... for now.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-08-2009 05:54 AM
тАО04-08-2009 05:54 AM
Re: Critical Error Redundancy Lost
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-09-2009 03:33 PM
тАО04-09-2009 03:33 PM
Re: Critical Error Redundancy Lost
The procedure I outlined above did solve the problem for me, for 1.5 weeks. Then I started getting "Degraded status" again on the same blade.
Interestingly enough, this time my "Power Allocated" figure for this blade is right where it should be (541 watts) so the root cause is different this time; not the same issue; and none of the blades in the cabinet show abnormal readings. But note that since my first post and this one, I have NOT performed any firmware updates.
Pulling the blade out and re-seating it cleared the status for this second event. The IML log did not contain any info on this event. I also looked at the Enclosure Information/Enclosure Settings/Configuration Scripts/Current Inventory script, which contains a lot of good info (I highly recommend that you run this script and paste it into a file you can refer to later) including the results of "SHOW SYSLOG OA 1". That log indicates events a few days ago that I had missed:
Apr 5 23:17:42 OA: Blade 2 is reporting degraded health status.
Apr 5 23:17:42 OA: Blade in bay #2 status changed from OK to Degraded
Note the times; they are within the same second! Something about this blade is not happy.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-10-2009 06:35 AM
тАО04-10-2009 06:35 AM
Re: Critical Error Redundancy Lost
We upgraded the firmware on the blade (BL680c) that was exhibiting the power issues to a version (2009.2.24) that was released just a few days ago and we did not experience the power errors. Immediately after we reverted back to the previous firmware (for other reasons), the power errors started again. I didn't notice anything in the release notes that indicated this specific bug was resolved.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-10-2009 07:05 AM
тАО04-10-2009 07:05 AM
Re: Critical Error Redundancy Lost
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-10-2009 07:13 AM
тАО04-10-2009 07:13 AM
Re: Critical Error Redundancy Lost
I just wanted to give an update to this issue. I opened up a case with HP and basically the response was that you need to install the latest and greatest firmware on your server, iLO2 and OA. In our case we already had the latest installed (BL460c G1; ROM at 11/02/2008, iLO2 at 1.70) with the exception of the OA, which was sitting at 2.32. After updating to the latest OA version (2.41) we were still experiencing the power allocation problems but after rebooting all of the individual blade servers the error went away and we have not seen it re-occur. Hope this helps...
Jesper
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-16-2009 12:40 AM
тАО04-16-2009 12:40 AM
Re: Critical Error Redundancy Lost
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-16-2009 03:19 AM
тАО04-16-2009 03:19 AM
Re: Critical Error Redundancy Lost
Unfortunately, for us, that has brought a new host of problems.
Apparently, this ROM enumerates physical devices to the OS differently, which means our HP Teaming definitions are no longer valid. I would suspect (although haven't tested) that removing / reinstalling the NCU would resolve this except our blades are running server Core... which doesn't have a mechanism for uninstalling the NCU. The only option is to rebuild the server.
So, since these blades are not yet in production, we bit the bullet and upgraded the ROM and re-installed the OS. Then we discovered that this ROM also breaks dynamic WWN assignment for HBAs... both HBA ports get assigned the same WWN. If we revert back to the previous ROM, the WWNs get assigned correctly, but we (again) have invalid Teams and we're back to the (original) power redundancy error.
I have a case open with HP regarding the WWN assignment problem. The last suggestion was to revert our VC firmware version and recreate the domain.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-17-2009 05:06 AM
тАО04-17-2009 05:06 AM
Re: Critical Error Redundancy Lost
Just to clear up a few things, and maybe focus the discussion. The real question is whether the problem is between the OA firmware reading/getting the wrong information from the Power Supplies, or whether the interaction between the Blade and the power supply at power up is causing the PS to report to the OA incorrectly.
1. Does the initial occurance of the problem ALWAYs occur when a blade is being powered-up??
2. Has the problem ever affected a RUNNING server, i.e. caused a crash or shutdown??
3. Have there been any cases where this issue is FIRST SEEN when powering up an Itanium Blade?
4. It is being implied that Proliant ROM F/W 2.24 fixes the problem, so has any Proliant running 2.24 exhibited the problem.
Also, although the emphasis seems to be moving away from OA F/W, and not withstanding that HP is still recommending a downgrade to 2.25 for those experiencing the problem, I am surprised to see in the most recent "Alerts" e-mail, recommendations to upgrade to OA Version >2.25 for what appear to be fairly trivial issues which can be manually resolved via the OA CLI.
Since the cause of the PS issue is not fully resolved, it seems a little reckless to be making this recommendation and risk invoking a much more serious problem (in my opinion).
Makes me wonder if anyone at HP is paying attention to the problems that this PS issue is causing. I am running OA 2.32, and I have not experienced the problem at my site, however I have to admit to a certain level of paranoia, making me fearful of performing many simple functions.
1. I am concerned about power cycling any of my servers since that may initiate the problem in my enclosures.
2. I am holding off with my intended upgrade of the OA firmware (to 2.41) because the problem has also been reported with this FW level, and again, I don't want to risk initiating the problem at my site.
Some official (HP) statement or comment, or reassurance would be appropriate at this point.
Dave.