Server Management - Remote Server Management
1821211 Members
3576 Online
109632 Solutions
New Discussion

Re: WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

 
SOLVED
Go to solution
broth-itk
Member

WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

Hello all,

 

we experienced today some severe server crashes while updating our servers to iLO 3.06 with ilorest.

Only DL380 Gen10+ Servers running VMware 7.0U3 (23307199) were affected.

The problem causes iLO to crash, taking the server down as well.

When logging in to iLO, the server is powered down, status and health RED but no log information in iLO Event or Integrated Management Log.

 

This is IMHO a severe bug which needs to get investigsgated by HPE ASAP!

 

The update should be pulled.

 

Best regards,

Bernhard

 

 

14 REPLIES 14
support_s
System Recommended

Query: WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

System recommended content:

1. HPE iLO 5 v3.06 Release Notes | Prerequisites

2. HPE iLO 5 3.06 User Guide | Downloading the Active Health System log (iLOREST)

 

Please click on "Thumbs Up/Kudo" icon to give a "Kudo".

 

Thank you for being a HPE valuable community member.


Accept or Kudo

DKC2
Regular Visitor

Re: WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

This happened to me as well.  The servers were previously updated to iLO firmware version 3.05 before it was removed by HPE and the servers had the HPE NS204i-p Gen10+Boot Controllers.  The servers were ProLiant DL380 Gen10 Plus.  The crash/reboot did not happen on the same model of server running Windows 2022, but this server did not have the HPE NS204i-p Gen10+Boot Controller.  I powered down two of my other ESXi servers and applied the update without issue.  The fixes for 3.06 include "Fixed a potential random server restart, or Uncorrectable Machine Check Exception (UMCE), when an iLO reset is triggered", so I wonder if just performing an iLO restart will crash the server.  There should be a warning on the download page because this crash has the potential to create serious problems in production environment running production lines.  Here is my event log that showed the firmware update and crash,

 

Error Messages.png

Casper_N
Senior Member

Re: WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

Same problem here on DL360 Gen10 + 
Going from 3.05 -> 3.06

Took a whole cluster down with 3 hosts in it.
All 3 host with NS204i boot controllers.

After a os reset the server seems fine.

So it could be issue with.
**REMOVED** The Release version of iLO 5 v3.05 has exposed a bug, when the host server is installed with NS204i boot controllers. A new iLO 5 version fixing the issue will be released in the next couple of days.

or 


 Fixed a potential random server restart, or Uncorrectable Machine Check Exception (UMCE), when an iLO reset is triggered.

 

broth-itk
Member

Re: WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

Confirm, my affected servers all have NS204 boot controller installed as well.

Sham82
HPE Pro

Re: WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

Hello @broth-itk

Thank you for your post.


Please find below document for related issue.
https://support.hpe.com/hpesc/public/docDisplay?docId=a00141858en_us&docLocale=en_US

 

Regards
HPE

 



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
broth-itk
Member
Solution

Re: WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

Warning: Ranty content!

@HPE:

Are you serious?

Slam, in your face: "Oh, look, there is a service bulletin!"

The least you can do is a RED noticable Warning on the download page.

The existing note 

**REMOVED** The Release version of iLO 5 v3.05 has exposed a bug, when the host server is installed with NS204i boot controllers. A new iLO 5 version fixing the issue will be released in the next couple of days.

is a joke.

No mentions of server crash, nor clearly visible.

 

3.04 had a fan control bug, customers were begging for a fix and we've been relieved to upgrade to 3.05

Now 3.05 fixed the fan but introduced a new issue, wait "exposed a bug"

 

Is this how you treat your customers?

What about Q&A?

Maybe something to do with layoffs of the old grey beards from HP era?

 

You know, problems happen, bugs happen... all human. 

But the way you communicate things, sounds very whitewashy.

Fool me once shame on you, fool me twice shame on me

Mr_Techie
Trusted Contributor

Re: WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

@broth-itk 

Good day!

It would be good my friend if you call a support and report this issue, since others also facing the issue they might escalate to right team and get this fixed ASAP. 

 

This is just my opinion. 

broth-itk
Member

Re: WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

@Mr_Techie 

Well, you're welcome to call support, try to explain issue to 1st level until at some point in time (days/weeks) later you get to someone which might be competent enough to tell you: "This issue is known, can't do anything 'cause it's the other team which does the webpage/announcements" blah blah blah

I'd rather got home early and have a nice time with my familiy as to join that BS.

This is just my opinion.

 

FrancWest
Frequent Advisor

Re: WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

We were also hit hard by this bug. Took our entire vSphere cluster down. On the download page there isn't any mention of this. I'm also subscribed to HPE alerts and advisories, but this advisory was never sent out by email. I did receive a notification mail that 3.06 was out, but there wasn't any mention of this bug in that email notification. Only on the fixes tab on line 3 it reads:

Fixed a potential random server restart, or Uncorrectable Machine Check Exception (UMCE), when an iLO reset is triggered.

even first line HPE support isn't aware of this. The case had to be escalated to L2 before we were provided with this advisory that it was a known bug.

Why on earth isn't this issue mentioned in big red fonts on the download page???

This caused me an entire Friday evening to bring our vSphere cluster up and running again and do all kinds of checks, since all vm's had an unexpected shutdown.

HPE really let us down by this major incompetence. Unbelievable!

ASS49324
Occasional Advisor

Re: WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

We have had the same issue today, but with ILO 3.07 firmware. - Took all "ProLiant DL380 Gen10 Plus" servers down. - Flash was done via ilorest-Tool.

FrancWest
Frequent Advisor

Re: WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

The issue is in 3.05 so any update from this version causes the server crash unless you do the workaround mentioned in the advisory.

Casper_N
Senior Member

Re: WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

From my experience, it's only servers with the NS204 boot controller you need to be aware of. 

We have many DL360 Gen10 that has E208i-a SR Gen10 controllers, where I upgraded the firmware from 3.05 to 3.06 without crashing.

If you use and have Amplifierpack setup, you can create a custom report to get a overview of servers with that NS204 controller.

This bug has definitely make me reconsider how many servers we bulk iLO upgrade.

LLBS
Senior Member

Re: WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

We have 3 ProLiant DL360 Gen10 at a Customers Site, one of them was updated to iLO 3.07 via iLO-GUI Firmware Update.

No crash using the following Boot Controller:

HPE NS204i-p Gen10+ Boot Controller 1.2.14.1013

 

Only problem we have so far is that CheckMK OMD can not use the check_mk-hp_proliant_fans anymore correctly.

HW FAN1-7 system is not working as we get the check failed.

Re: WARNING: iLO 3.06 causes crash of DL380 Gen10 PLUS Server with VMware 7.0U3

We experienced the same problem on 9 out of 10 servers running ESXi 8, all are DL380 Gen 11 with the HPE NS204i-u Gen11 Boot Controller.
The iLO firmware on all these machines was 1.60, we update to 1.62 

On 2 servers DL385 Gen11, also with the HPE NS204i-u Gen11 Boot Controller, there was no crash after updating the iLO from 1.60 to 1.62.

We have a lot more servers were the update is needed. It will be fun....