HPE OneView
1832857 Members
3080 Online
110048 Solutions
New Discussion

Re: Replace VC Blade

 
SOLVED
Go to solution
Mike1295
Frequent Advisor

Replace VC Blade

Hello,

I recently updated my Synergy 12000 with the most recent SSP for the Gen10 blades, SY-2022.08.01.

The firmware install went fine all the way up to upgrading the Interconnet firmware.  The message prior to the install was that it should only take around 25 minutes to complete but when I checked 3 hours later, it was still running on the first interconnect.   It eventually finished/failed.  It stated that it was running 1.9 firmware and that the other interconnect was on 1.8 so I had a mismatch.  I forced the second one to update and it completed and came up fine.   There doesn't appear to be an issue with the firmware, only that it hung on my install for some reason.

After the second interconnect finished I began receiving error messages stating that several of the downlink ports on the first IC were down.  They showed "disabled" and I could not enable them. 

I opened a ticket with HPE but so far, they haven't fixed the issue.  Their last message was that I should attempt a reset of the VC mezz card but that I should wait for them to put together a plan for me.  That was 2 days ago.

Other than backing up the enclosure configs what should I be cautious of?  I figured that the process was press the button, wait for the mezz card to initialize and then reapply the configurations.  Does that sound correct?

-Mike

10 REPLIES 10
DanCernese
HPE Pro

Re: Replace VC Blade

What's the case#?  If I were you.. I would do a 'reset' on each VC module in sequence first.  You did not describe what is running on the compute modules-- did you also update the compute module mezz card firmware to match the infrastructure?  ...because if it's too far off (like a lot), they may not be able to stay connected; the OS drivers also have a strong play in that space too.  It could be the downlinks are down because the drivers are not compatible.  



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Mike1295
Frequent Advisor

Re: Replace VC Blade

Thanks for the quick response!

You are correct, I glossed over a lot of the fine details.

HPE Support Case 5366626248

We are running VMware ESXi 7.0.3 on the blades. and OneView 6.6 on the appliance.  We have Composer 1's so 6.6 is as high was we can go for now.  The blades are all SY480 Gen10's.

I understand your point on the driver support, but one Interconnect is fine and the guests and hosts respond to pings, etc., so it looks as though the drivers work ok.  We are using the QLogic FCoE's with qfle3f drivers.

qlnativefc: QLogic Fibre Channel HBA Driver 4.1.34.0-1OEM.700.1.0.15843807 QLogic NetXtreme II 10 Gigabit Network/iSCSI/FCoE E3 drivers for VMware ESXi 3.0.157.0-1OEM.700.1.0.15843807

If these drivers aren't the best ones, please let me know and I can update them to see if it helps.

We're sort of stuck with updates at the moment, though.  With only 1 mezz card working, it will take a little planning to install as this has some productions machines that can't be migrated off outside of a maintenance window.

-Mike

Mike1295
Frequent Advisor

Re: Replace VC Blade

Oh, and when you mentioned that I should "reset" my VC's, do you mean perform a software reset, or a hardware reset?

I have used OneView to reset my problem VC several times.  I've reapplied the configs and reapplied the firmware.  I have done everything (as far as I know) short of powering off, or perfoming a hardware reset and reapplying the config.  This is where we (HPE and I) have ended up.  I have also rebooted and reapplied my server profiles and firmware, etc.

As I mentioned, HPE support is suggesting a hardware reset of the VC and then (I am guessing here as they haven't responded in a couple of days with a plan) reapplying the config and seeing if that helps?

-Mike

 

DanCernese
HPE Pro

Re: Replace VC Blade

Quick answer (haven't read the case yet), in OneView there is a 'reset' for the interconnect module.  If that doesn't clear them; there is also a "Power off" option-- that will attempt to preserve/roll the logs before powering off.   I don't know what they mean by 'hardware reset'.  Reapply the config is useful after the power cycle; because if nothing changes they don't do anything but if something was out of sync, it will fix it.  If these are multiple frames, reset of the ILMs too clears things up.

If all that does nothing-- the next level of support will need to consult on the case and advice a plan of action, that might be what's taking so long.  There are  situations where clearing the config completely (factory default) and reapplying the config can clear things up (that is most easily done by swapping the modules but also can be done by advanced support team remotely, we don't do that a lot).

On the other hand.. ..without analyzing why the ports are down my suggestions are just off the cuff; I think your drivers are not too far off but not matching the firmware.  You mention 4.1.34 and the comparison chart shows 4.1.35 (with the other matching ones I don't know what they are).  There is an HPE-supplied ESXi custom ISO image with the matching drivers already built ready for download (sorry, I'm short on links right now).



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Mike1295
Frequent Advisor

Re: Replace VC Blade

I appreciate your time and your advice.  Also appreciate the behind the scenes knowledge that let's me know why my support tech might be taking a while to get back to me.

No worries about the links, I can dig them up.  I might actually have the updated image downloaded as I figured that it might come in handy.  I'll give that a shot as it's non-invasive.  I can re-image a blade and not affect anyone.  Maybe updating the drivers is all that it will take?  I'll update this conversation as I go.

I believe by "hardware reset" they mean the button on the mezz card itself that you have to push with a paperclip.  Just a guess though.

-Mike

DanCernese
HPE Pro

Re: Replace VC Blade

I scanned the case-- I read it as a factory default reset of the troubled VC interconnect module.  Then reapply it's config.  That requires a command line for second level of support provide a challenge/response key (unless I'm mistaken) to do that reset.  I sent an internal message asking them to move along with their plan of action.  If you can reimage a compute module with the newest image ( https://techhub.hpe.com/eginfolib/synergy/sw_release_info/ESXi_V70_U3_Aug.html ) that would be a safe bet to sync up with SY-2022.08.01



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Mike1295
Frequent Advisor

Re: Replace VC Blade

@DanCernese 

I re-imaged one of the compute modules and after doing a stare and compare I found that it upgraded the firmware/drivers on 12 different components.  Those upgraded components were mostly to do with NIC's so we're at least this far.  

Still no change with the VC's though.

I tried re-enabling the Downlink Ports but they are still showing "disabled".

Thanks a lot for the help that you've provided.  it is appreciated.

-Mike

 

Mike1295
Frequent Advisor

Re: Replace VC Blade

HP Support got back to me and provided this as a way to factory reset my VC and then reapply the configuration:

Open the OneView GUI on Firefox browser (as it has built-in REST API client)

Press Shift+Ctrl+i
or
Click on the three horizontal lines at the top right of the nrowser (it should say 'open application menu' when hovered over it !
Click on 'More Tools' down below & then click on 'Web developer Tools' ...

Now, you get the page divided in to 2x horizontal views ... The top section is the usual GUI view of OneView & now we have the extra bottom section which has the 'Web developer Tools'
(You may want to expand the bottom section so that we have a better view)
Click on the tab that says 'Network' ...
Now wait for a few seconds and you should see some GET methods being executed ... These are done by the OneView GUI to refresh the data being presented on the screen

Click on any one of the GET methods and now the botton section of the page will get further divided in to 2x sections ...
Now, click and drag on the line that divides the 2x bottom sections and make them equal size.

Select the headers tab and scroll down a little to check the 'request header' section ... Ensure we got an entry her called auth:xxxxxxxxxx (this is the session authentication token ID)... If this is missing, then select a different GET method from the left until we find one that has an auth token in the 'request header' section ...
- Now, click on the 'edit and resend' button on the top right of this secton ... It might just say 'resend' and once you click on the 2x dots there you will get the option 'edit and resend'

We are now ready to edit this existing command and perform the task we intend to do ...
- Let's take a look at the syntax as per OneView 6.60.00

Now, for example let's do a factory reset of VC ICM at bay3 ! Select the desired VC ICM at Bay3 & take a note of the UUID for it ...

Copy the UUID : it startes right after rest/interconnects/ and ends right before the questionmark

Replace the Method from 'GET' to 'PATCH' ...
- Update the URL to https://<OneView IP>/rest/interconnects/<UUID of Interconnect>/
- Edit the Query String & make it blank (it will ideally disapear once the method is changed to 'PATCH' & the URL is updated !
- Edit the Request headers to leave only these following 3x entries & delete the rest
Content-Type: application/json
Auth: abcdefghijklmnopqrstuvwxyz012345
X-Api-Version: 3800
- Copy paste the following in the 'Request Body' section :
(Note : do copy along with the spaces at the beginning, else it will not work)
[
{ "op": "replace", "path": "/factoryResetState", "value": "ReapplyConfiguration" }
]

Verify everything is correct and click on 'send' ...
watch the task on the OneView GUI !

also at the bottom left see that the PATCH command returned a status code of 202 which means 'accepted' !

Now, wait for the tsk to complete on OneView (this might take several minutes)...

- If in case the REST API command fails and you get any other response code apart from 202, then select that command and click on the 'response' tab to see what went wrong

Mike1295
Frequent Advisor
Solution

Re: Replace VC Blade

In my case, the factory/hardware reset didn't work.  Well, it did, partially.

The downlink ports went from all red to all yellow but still would not enable.  I communicated my results to HPE Support but they haven't gotten back to me as of this writing.  It's been 5 days. 

The day after performing the reset procedure I received a notification that one of my blades had dropped (SY 480 Gen10) and had powered off.  The error message was that the BIOS/Firmware was incompatible.  I am running the most up to date firmware from HPE.

I couldn't just power on the blade, I had to reapply the profile and then it would allow me to power it on.  It booted normally but half of the VC's weren't available which was expected.

I opened a new case with HPE Support and they determined that the Mezzanine card was bad and that a new one would be sent to me.  It should arrive today.  I'll update this this discussion with my results.

Mike1295
Frequent Advisor

Re: Replace VC Blade

The end fix was that I pulled a known, good interconnect module from another chassis and replaced the problematic one in the "broken" chassis.  After applying the configuration, all of the downlink ports came up enabled and green.

I put in another ticket with HP and they agreed to replace the interconnect.

All in all, it took three different support tickets and exactly one month to get this pushed through.  It seems that the technicians at HPE Support all wanted to focus on the compute modules as being the problem and appeared to overlook that I was experiencing this issue on every compute module.  It was a confusing case.  The support techs coulldn't find any errors in the logs that would explain the interconnect as being the culprit.

In any case, everything is up and running so I'm good.