HPE OneView
1826783 Members
1482 Online
109702 Solutions
New Discussion

Does OneView and VLCM actually work?

 
T_1_6
Regular Advisor

Does OneView and VLCM actually work?

Running vCenter 7.02, DL360 Gen10 Hosts with Esxi 7.0.1, trying to upgrade to 7.0.2d:

Setup Oneview, OV4VC, the SPP, vLCM registered, all working fine, created a cluster wide image, HPE Custom add-on, and selected the correct 2021 SPP in vLCM, all looked 100% fine.

Followed the odd instructions in the HPE docs to set a firmware basline in Oneview, and all the server profiles/template to "firmware only using Smart Update tools", and "immediate" for the other option. This then started to actually install the firmware, which I thought was odd, as that should be done via the vLCM surely?

Then when it comes to remediation, everything falls flat, and its the worst experience I have ever seen.

It will reboot a host, then says "Started Hardware Update action: 'POST_IMAGE_UPDATE' for host 'xxxx.xxxx.xxxx'. and that is that. Nothing happens, stays there for hours. Then it reboots finally, and fails with:

'xxxx.xxxx.xxxx' - Failed to remediate host
  •  Timed out waiting for Hardware Support Manager tasks

This is about as useful as a chocolate teapot and I have to keep manually retrying the remediation for each in order to get it to finally succeed after about 4hrs. A total waste of time! Spent 2 days on this already, and only have one cluster half upgraded. Sometimes we also get :

  • 12/28/2021, 2:23:30 AM:Hardware update action: 'POST_IMAGE_UPDATE' for host 'xxxx.xxxx.xxxx' 'FAILED' on HSM 'HPE OneView for VMware vCenter'.
  • xxxx.xxxx.xxxx - Failed to remediate host
    •  "One of the iLO task is in Pending or Running state, so UPDATE can not proceed for host host-70"

Does anyone have any idea what the issue could be, or how I can even start to figure this out?
Full vLCM steps below, it works fine until it gets to the HPE part of things!

 

  • 12/28/2021, 12:48:19 AM:Started Hardware Update action: 'POST_IMAGE_UPDATE' for host 'xxxx.xxxx.xxxx'.
  • 12/28/2021, 12:48:19 AM:Completed rebooting host 'xxxx.xxxx.xxxx'
  • 12/28/2021, 12:45:29 AM:Started to reboot host 'xxxx.xxxx.xxxx'
  • 12/28/2021, 12:45:29 AM:Completed installing the image on host 'xxxx.xxxx.xxxx'
  • 12/28/2021, 12:44:34 AM:Started to install the image on host 'xxxx.xxxx.xxxx'
  • 12/28/2021, 12:44:34 AM:Completed hardware update action: 'PRE_IMAGE_UPDATE' for host 'xxxx.xxxx.xxxx'
  • 12/28/2021, 12:43:30 AM:Started Hardware Update action: 'PRE_IMAGE_UPDATE' for host 'xxxx.xxxx.xxxx'
  • 12/28/2021, 12:43:30 AM:Completed entering maintenance mode for host 'xxxx.xxxx.xxxx'
  • 12/28/2021, 12:42:55 AM:Started to enter maintenance mode for host 'xxxx.xxxx.xxxx'
  • 12/28/2021, 12:42:55 AM:Completed hardware update action: 'STAGE_UPDATE' for host 'xxxx.xxxx.xxxx'
  • 12/28/2021, 12:42:52 AM:Started Hardware Update action: 'STAGE_UPDATE' for host 'xxxx.xxxx.xxxx'
  • 12/28/2021, 12:42:52 AM:Completed hardware update action: 'UPDATE_PRE_CHECK' for host 'xxxx.xxxx.xxxx'
  • 12/28/2021, 12:42:19 AM:Started Hardware Update action: 'UPDATE_PRE_CHECK' for host 'xxxx.xxxx.xxxx'.
  • 12/28/2021, 12:42:19 AM:Starting remediation of the host 'xxxx.xxxx.xxxx'
14 REPLIES 14
T_1_6
Regular Advisor

Re: Does OneView and VLCM actually work?

Just to add, every host I have tried does get this error in vCenter:

A general system error occurred: Timed out waiting for Hardware Support Manager tasks

It is like the hosts are not being rebooted properly by OneView/vLCM, and after some firmware is updated, they just dumbly sit there waiting to time out. 
Remediating continually each host until it goes through seems at the moment to work, but seeing as I am doing each host 1 by 1, will take days/weeks

 

PriitP
Frequent Advisor

Re: Does OneView and VLCM actually work?

I can claim it really worked and it worked very well, until latest SPP 2021.10.0 That SPP has ILO5 firmware 2.55 and then this gets installed one starts getting exactly the same issues, like you described. I have around 100 DL360/380 Gen10 servers with Mellanox 10/25G nics and all af them have this problem. Oddly this issue is not araised with DL380 Gen10+ servers with Mellanox nics. So i'm also waiting for a solution to this issue from HPE support.

T_1_6
Regular Advisor

Re: Does OneView and VLCM actually work?

I am glad you replied to this, as I found some random document online from HPE, which might suggest the latest 2021.05 SPP could be an issue....

SPP-HPE_Custom-Image-vibsdepot-mapping-Gen9-later.pdf

If you go to the section: SPP 2021.05.0

It says this: 
Not supported with VMware Life Cycle Manager (vLCM) and the HPE Hardware Support Manager (HSM), HPE vLCM Plug-ins.

So could that be it? SPP 2021.05.0 is what I was using, and it flatly refused to work, well, I ended up remediating/rebooting each host about 6 times, and it finally sorted itself out.

For section 2021.10.0:

15 This SPP supports use with vLCM and the OneView and iLO Amplifier HSM with ESXi 7.0 U3 and matches the content in the listed ESXi 7.0 U3 October
26 2021 HPE Custom Image. This SPP also supports use with vLCM and the OneView HSM with ESXi 7.0 U2 to update the SW/Driver/FW recipe.

Edit: I was using the 2021.10.0 SPP, and still had the issue... ah well.

 

 

AmRa
HPE Pro

Re: Does OneView and VLCM actually work?

Make sure the iSUT service is set to AutoDeploy on the host.

1. Log in to ESXi host.
2. Set the iSUT mode to AutoDeploy Mode
# sut -set mode=AutoDeploy

3. Additionally, the following command will show you how iSUT is currently configured:

# sut -status
Or
# sut -exportconfig

What is iSUT:
Integrated Smart Update Tools (iSUT) for ESXi is a utility that assists in performing online software and firmware updates on a system that uses tools such as HPE OneView, iLO Amplifier Pack, or SUM via the iLO management network. iSUT has 4 modes:

OnDemand
AutoStage
AutoDeploy
AutoDeployRoboot


The default install mode is OnDemand mode. In OnDemand mode, iSUT does not perform any updates automatically. When set to AutoDeploy, iSUT runs as a service and polls the iLO for requested updates from external tools. Once detected, iSUT can perform the following actions:

Just stage the updates
Stage and deploy
Stage, deploy and reboot


With iSUT set to AutoDeploy, vSphere Lifecycle Manager can ensure consistency of ESXi, vendor addon, and firmware at a cluster level using the desired image. You can find the latest documentation for iSUT in HPE’s SUT information Library website at http://www.hpe.com/info/sut-docs.

I am an HPE Employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Accept or Kudo
MCSAP
Frequent Advisor

Re: Does OneView and VLCM actually work?

I was wondering if iSUT was still a requirement.  That is a major flaw in HPE's "seamless" management of servers using OneView.  You are still required to install and configure every single ESXi host with iSUT.  YET, you haven't integrated in the installation and configuration into the OneView for vCenter plug-in.

It would be great if the OneView for vCenter plug-in could work with VCLM to install the iSUT VIB and mark the option.

I don't believe the HPE Custom ESXi image has the iSUT vib already added.  Maybe it does and I'm mistaken.

I really don't know why it is still required...OneView talks to the iLO and pre-downloads the FW + vCenter talks to ESXi for maintenance/reboot status...OneView for vCenter + VCLM patches the ESXi host.  iSUT no longer should be required.  Can we make it so!?!?    

Doug de Werd
HPE Pro

Re: Does OneView and VLCM actually work?

Yes, iSUT and AMS are required for VLCM.

Starting with OV4VC 10.3, there is a verification check available to make sure that iSUT and AMS are installed, running, and configured properly - and if not, there is a remediation button to help fix it.

We check for AMS is running or not. If AMS is not running, we will start it.

For iSUT we will check the running state and configuration mode. During remediation we will start the SUT and set it to “AutoDeploy” mode

iSUT and AMS are included and installed by default with the HPE custom. OV4VC does not install either of them - it only checks for configuration.

iSUT and AMS Verification for vLCM.jpg
I am an HPE employee
Accept or Kudo
PriitP
Frequent Advisor

Re: Does OneView and VLCM actually work?

For me the last working SPP, well actually VUP, was this 

VMware ESXi 7.0 U2 Upgrade Pack

https://support.hpe.com/hpesc/public/swd/detail?swItemId=MTX_3762cce274214ab8acb4a5dd9c#tab2

With this one uploaded to Oneview and used with cluster image, patching servers was working. SPP2021.05.0 didn't work and newest SPP 202.110.0, although claimed by HPE to be fully supported din't work either. The reason, why it didn't work was not about iSUT or any other component not configured, but the ILO5 firmware 2.55 and Mellanox cards in server. With this combo updates always time out. So HPE changed something in ILO5 fw 2.55 and newer. I have a support case on this also, but no solution yet. Without Mellanox cards in server latest SPP 2021.10.0 works ok.

T_1_6
Regular Advisor

Re: Does OneView and VLCM actually work?

All our iSUT and AMS were all green for all our clusters.

I think it MUST have been the ILO 2.55 firmware, as sometimes when it failed it failed with that obscure ILO error I posted above.

Forcing remediation and rebooting endless times, eventually it worked, so fortunately for now we are as up to date as we want to be, as the 7.03 from VMware is an absolute disaster, we will be staying on 7.02d for at least 6 months!

I do find the HPE document procedure odd though, as soon as you set the recommended firmware settings to iSUT and "immediate" as recommended, all the firmware starts installing, and I thought that would happen during vLCM and not before. Maybe the recommendation to do this is so it is all staged and ready for when you pull the trigger with vLCM. 

T_1_6
Regular Advisor

Re: Does OneView and VLCM actually work?

Its definitely included in either the custom image, or the add-on in vCLM cluster image.

I have never manually added iSUT, and originally we started with a manually installed HPE custom image, then moved to vLCM for the updates with the HPE custom add-on, and iSUT has always been there.

So props to HPE for this, it was seamless for us!

Doug de Werd
HPE Pro

Re: Does OneView and VLCM actually work?

Reagrding the settings in the OneView Server Profile Template...

Make sure your are setting the Activate Firmware Imediately in the Template, and not in each Server Profile.  You can actually ignore the specific SPP in the Template if you want.

The way vLCM works is that we grab each server, one at a time (serially) and set to maintenance mode, then do the ESXi and Add-on (driver/SW) updates - these first two parts are handled by vCenter. vCenter then passes control over to OV4VC to do the final part, which is the FW update from the SPP/VUP.

OV4VC goes to the server's Server Profile (not Template) and updates the FW baseline, and since it is set to immediate, it starts.  When it is done, it moves on to the next server.  So OV4VC basically ignores the Server Profile Template settings for the firmware baseline and works directly with the Server Profile.

If you are using OneView for FW and Server Profile compliance checking, you can update the Server Profile Template after vLCM has done it's update (it really doesn't matter whether you do the template before or after). Once the vLCM update is done, you would get compliance errors in OneView bedcaause the SP doesn't match the SPT (from the FW baseline perspective). You can then just go into OneView and update the SPT to whatever was used as the FW baseline in vLCM, and the compliance errors will automatically clear.

Doug

I am an HPE employee
Accept or Kudo
T_1_6
Regular Advisor

Re: Does OneView and VLCM actually work?

Well still broken with the latest 2022 SPP, Oneview 7.00, and Gen10 servers.

Remediating the cluster every single host fails with:

"one of the ILO task is in Pending or Running state, so UPDATE can not proceed for host"

Then the whole cluster remediation halts. Repeating the process over and over and over eventually all the hosts get patched and updated, but it takes hours and hours.

This is a really tedious experience, it was supposed to be one reboot, (or two I guess at a push) and easy and simple. I have yet to use vLCM with HPE servers and have it actually work.

Such a waste of time!

 

UserName1
Frequent Advisor

Re: Does OneView and VLCM actually work?

We have the same painful process. One click process with about 50 clicks to it.

We have worked out all the problems with it now so have a methodology that works for us. I have found servers that have a particular network card it cause it to fail, secure boot also causes it to fail, quick boot in VMware causes it to fail and the default spp causes it to fail, we have to use a custom baseline…..

And don’t get me started on the resetting of iLO names when it updates then to the new single image baseline firmware which also causes it to fail but HPE refuse to fix after they broke that feature as it’s too difficult to fix!
T_1_6
Regular Advisor

Re: Does OneView and VLCM actually work?

wow, quickboot causes it to fail? Any details on that? I have it enabled. I did notice when observing the test process on a single host, quickboot did work well when vmware was updating esxi, was super rapid to come back and skipped the long winded Gen10 POST.

I think for us we are going to have to revert back to basics, and apply firmware updates via Oneview "manually" so to speak, then simply use vLCM to apply the Esxi updates, ideally using quick boot as above, so it can rattle through larger clusters faster.

We do run Intel 10Gbe cards, and I see updates for these in 2022 SPP, and also when watching the firmware patching process from the ILO console.

Even so, regardless of any of this, vLCM was supposed to handle all this, and make it simple for folks, and in reality it has actually made things harder.

It seems (when doing a manual test) that the 2022SPP for us needed literally about 6 reboots to apply any firmware outside of vLCM itself, but the vLCM integration is a pure mess.
I realise things can get compliicated patching servers etc, but what I do not understand is why simply the server cannot reboot, patch what needs to be patched, soft reboot however many times it needs to, THEN when all is done boot back into Esxi. 

Right now things time out for vLCM as it waits for stuff, the last round ILO patching was broken it seems, this time round ILO looks ok, but now its nic firmware and others causing problems etc. I cannot keep up, maybe in 2025 this might be ready for production?

T_1_6
Regular Advisor

Re: Does OneView and VLCM actually work?

I will also add, this is using Oneview and the SUT integration, so that when the firmware is applied via template, its activated immediately, and everything goes to staging and whatever else afterwards for all servers.
So the SUT has done its thing, and is "ready" or as ready as it can be, and I manually wait a good while for SUT to settle itself before then starting anything in Life Cycle Manager.

So even giving it the best possible shot to succeeed, it still fails miserably!