BladeSystem - General
cancel
Showing results for 
Search instead for 
Did you mean: 

Interconnect Failure after OA Firmware Upgrade

SOLVED
Go to solution

Interconnect Failure after OA Firmware Upgrade

Hi, we have just upgraded the OA firmware from 2.32 to 2.60 (we have 2 OAs in a c7000) and we lost ALL communication to the backplane somehow after the firmware upgrade (4x Cisco 3120G and 2x Brocade SAN SW), this has caused a major panic in our data center and major downtime (over 100 virtual servers down without notice).

Unfortunately or not this is the second time this happens in 48 hours and we resolved it last sunday by pulling out all the switches from the back and then reseating one per minute. Everything came back online.

Fearing it was related to the OA version I went ahead and upgraded OA to version 3.0 (this one went in without any problems like usual), but now I have management breathing on my neck and preventing me to go to OA 3.11 which is labeled critical.

Has any one experienced total backplane communication problems like that because of a OA FW update? Will staying on OA 3.0 for some time not cause more problems ?

Thanks.
12 REPLIES
juan quesada
Respected Contributor

Re: Interconnect Failure after OA Firmware Upgrade

i have if your components inside of the blade system are very outdated, sample:
OA 3.11 with BIOS-ILO of servers from 2008.

regards,

Re: Interconnect Failure after OA Firmware Upgrade

Thanks, but actually we had just upgraded all the BIOS and ILO firmwares prior to upgrading the OA.

What I have just discovered is that our network team had put in a new Cisco firmware in one of our 4 3120G switches, they have told me the firmware is OK even though it was downloaded direct from Cisco, could this be the root cause ? I find it strange that one of the switches could somehow bring down the entire backplane down, they are all communicating with the OA on an internal ethernet port but to bring everything down crashing is somewhat surprising.
Jan Soska
Honored Contributor

Re: Interconnect Failure after OA Firmware Upgrade

Hello,
very strange, we did not notice such behavior in our whole history (3,5 years on c7000) with brocada san's and cisco 3020's .
But - I have never jumped more than one OA firmware generation, trying to keep it on most current level (1 month after release...).

What says HP support?

Jan

Re: Interconnect Failure after OA Firmware Upgrade

I was instructed by HP support to upgrade to OA 3.11 and then call back if the problem would present itself again.

We tend to be somewhat lousy at putting firmware update soon after release since it involves powering down and moving a lot of virtual servers around and it is somewhat time consuming especially with having to boot from a virtual CD with ILO for all ESX servers, unfortunately updating BIOSes is more work with ESX compared to windows platforms. At 30+ blades it is about one week full time of work and many machines have to be done at night too. That is why is is faster to update after a few releases and then we tend to stick with such a version for a little while unless we notice it is not 'bulletproof' and then the cycle starts all over again.
Fred Dy
Occasional Visitor

Re: Interconnect Failure after OA Firmware Upgrade

Have you upgraded your Virtual Connect firmware as well? Generally upgrading the OA firmware needs a corresponding Virtual Connect firmware upgrade, unless the Matrix compatibility guide says that there will be no issues with the versions you are running with.
gregersenj
HPE Pro
Solution

Re: Interconnect Failure after OA Firmware Upgrade

Well, it's not the midplane that fail.
The midplane got no active components.

But communication can fail between the devices, could be OA etc.

I was reading some release notes last week.
And I think, I did see something like that.
Tried to re-find it, but no success.

Readind the release notes / fixes. 3.00 should be OK to stay on 3.00.

You should check the compatibility matrix. To ensure, that things work together properly.

And yes, there's many points of view, wehter you should use - If it broke, don't fix it. Or upgrade uncritically.

I prefer to check the release notes, and make up my mind, from that.

ps. You can upgrade mulitble servers, from the same image, simutainously. But you do need to shutdown/move several virtual servers.
Beware non-virtual servers also need to reboot to enable new FW. Not all FW though.

BR
/jag

Re: Interconnect Failure after OA Firmware Upgrade

RE: Fred Dy

We do not have any Virtual Connect in this c7000, we have 4 Cisco 3120G and 2 Brocade SAN switches only.


RE: gregersenj

I know the midplane is all passive traces, but there could still be some problems with the actual signals between the OAs and the various networking and SAN equipment connecting through the midplane. I think it's kind of a freak accident that happened twice in 48 hours but that we've never seen before in the 2 years we've had the c7000.

As for upgrading multiple servers at the same time, yes it's a good idea we'll try to negotiate some down time, right now we have a 30 minute windows each month which is not enough.
gregersenj
HPE Pro

Re: Interconnect Failure after OA Firmware Upgrade

As I wrote.

I have read, something similar, and I think it was in some release notes, but I can't remember, and I can't find it.

My guess is, that the reboot of the OA, after the FW upgrade has caursed the switches to reboot also.
That's my Guess, and I could be very wrong.

I would consider to upgrade switches also.
Or at least follow the support matrix.

You might need some service windows for that.

BR
/jag
cjb_1
Trusted Contributor

Re: Interconnect Failure after OA Firmware Upgrade

Nice standard response regarding upgrading firmware. There is a fix in v3.11 that helps with these comm's issues but there is a workaround for v3.00. Details are in the release notes.

In answer to your question i think a number have people have seen issues like yours. Check out the thread in this forum from the guy who has lost his san!!

Re: Interconnect Failure after OA Firmware Upgrade

gregersenj ::: Thanks for the reply, but we could not repeat the problem when we applied the 3.11 OA firmware over the 3.00 10 days ago.

As for upgrading our Cisco 3120G in the C7000, we have lost their control and maintenance a couple months ago so we do not even have web access (from the OA) anymore so I have no idea of knowing their current firmware. Could you point me to some link/doc I could formward to my new network admin? Thanks.


cjb ::: Yes I have gone back to reading this forum even going as far as 2-3 years but could not find a single post (might have skipped some though) where they have lost all midplane communication where their only solution to bring back both lan and san communication was to unseat/reseat all san and lan switches. If anyone could point me to such a post I'd be happy. Thanks.
cjb_1
Trusted Contributor

Re: Interconnect Failure after OA Firmware Upgrade

Frederic

Similar issues. Loss of communication between the OA and modules (in our case vc) ends up with having to reseat/reset everything). I've seen it on a FW upgrade a while back but these events now occur infrequently, at random times, with differing severities. I understand they are not exactly the same scenarios as we all run different environments.

If you are staying on v3.0 check out this customer advisory.
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02499458

I'd recommend requesting HP review your firmware and get them to tell you what you should be running.

Good luck.
trilee2
Visitor

Re: Interconnect Failure after OA Firmware Upgrade

I also had the problem upgrading from 3.00 to 3.21 on our C7000 with one OA. This has also caused a lot of panic in our datacenter and has everyone concern especially since we have so many. All restarted on it's own without any reseating.

Interconnect Devices:
4 - Cisco 3120 network switches, 3 restarted, 1 did not.
2 - Cisco MDS 9124e Fabric Switches did not restart


Another incident a couple months ago where we lost OA web interface access (3.11 advisory). We had to reseat the OA to regain connectivity and the interconnect devices dropped. We had to reseat the interconnect devices to bring them back online.