ProLiant Servers (ML,DL,SL)
1825766 Members
2084 Online
109687 Solutions
New Discussion

Outage due to self-launched reboots by SPP? / Multiple reboots after SPP

 
SOLVED
Go to solution
Anonymous
Not applicable

Outage due to self-launched reboots by SPP? / Multiple reboots after SPP

Hey,

a few days ago we had an outage because our VMware hypervisors rebooted themselves after an SPP was "finished".

SPPs had been appyled shortly beforehand, after the server had booted up again the VMware maintenance mode was left and the next hosts were started. However, on two Gen10 systems, the system rebooted again (by itself) after a few minutes (unfortunately there were already a few VMs on the system at this point)

Has anyone observed this behavior before?
or is it even a desired behavior?

The HPE documentation points out that some components are only updated to the latest version after several SPP applications.
Is there perhaps a mechanism that recognizes this and then automatically retriggers the SPP process/reboot ?

How can you be sure that the SPP has really been completed?
(We were still able to track update messages in the iLO after the first reboot)

4 REPLIES 4
Anonymous
Not applicable

Betreff: Outage due to self-launched reboots by SPP? / Multiple reboots after SPP

At least one person in the VMware forum has also observed this behavior,
but also with other operating systems, as long as you install the SPP via the ILO as ISO

https://communities.vmware.com/t5/vSphere-Hypervisor-Discussions/Outage-due-to-self-launched-reboots-by-HPE-SPP-Multiple-reboots/td-p/2994176

Suman_1978
HPE Pro

Betreff: Outage due to self-launched reboots by SPP? / Multiple reboots after SPP

Hi,

SPP is a bundle of drivers and firmwares put togeather.
SPP will install components based on your selection or component's driver/firmware availability.
SPP will not automatically retrigger.
At the end of SPP installation, it says Success for each component below Deployment Status.
Refer to this video on youtube https://www.youtube.com/watch?v=twNEtYLwCIc

At HPE also we have several videos.

If you still have questions on several reboots, please log a support ticket with HPE.

Thank You!
I work with HPE but opinions expressed here are mine.
HPE Tech Tips videos on How To and Troubleshooting topics



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
RuneH
Occasional Contributor
Solution

Re: Outage due to self-launched reboots by SPP? / Multiple reboots after SPP

Yes, we had exactly the same problem.

After SPP had completed a first set of other firmware updates it rebooted and installed the SPS 4.1.5.2 update (during post). Server was then reset again and booted into ESXi as normal. Problem is that the SPP had also queued a "Wait task for Bmc" in the iLO installation queue that lasted 240 seconds, meaning "do nothing for 4 minutes" while the server started as normal. Then after that the 6.22 update for "HPE Smart Array P408i-p, P408e-p.." (HPE_SR_Gen10_6.22_A.fwpkg) was queued, and then finally a "RESET Task for iLO" that caused the uncontrolled reboot of the server.

From the server reset after the SPS update to the last reset it took over 10 minutes, which is plenty of time for VUM to take the host out of maintenance and migrate VMs to it. Because thats the point here, when you are to update hundreds of host you want to do this automatically together with updating ESXi so you are running the correct combination of the SPP version and ESXi build, according the HPE's "Valid-vLCM-Combos.pdf" document, right?

All this can be seen the iLO event log, both how the different steps are queued in the iLO installation queue and their execution. We didn't get any real help from HPE support, but solved it ourselves by making a our own custom version of the 2023.09.00.00 SPP where we had removed the HPE_SR_Gen10_6.22_A.fwpkg file from /packages/. Then we made a second custom SPP with only storage controllers included and run this by itself to install the HPE_SR_Gen10_6.22_A.fwpkg (for some reason we also had to include Bios and iLO or else the SPP wouldn't work). The HPE_SR_Gen10_6.22_A is then installed directly while running the SPP OS and not queued in the iLO installation queue, so there's no uncontrolled server reset after ESXi has booted.

LuisSoares
Occasional Advisor

Re: Outage due to self-launched reboots by SPP? / Multiple reboots after SPP

Wow!  I was just hit by this issue

Beware of this… and to avoid unexpected outages do the following:
1. Once SPP is “finished” and ESXi is up and running, keep ESXi in maintenance mode for 5 minutes at least, and keep an eye out for the iLO messages regarding additional updates being triggered automatically.
2. Once the additional updates are completed, ESXi will perform a hard reboot!... once it comes up, wait another 5 minutes, shutdown ESXi, power it back up, and wait another 5 minutes before removing it from maintenance mode. If more updates were triggered during the 5 minute wait period, keep doing step 1. And step 2. Until no more firmwares get triggered…