BladeSystem - General
1748184 Members
4040 Online
108759 Solutions
New Discussion юеВ

Re: Interconnect Failure after OA Firmware Upgrade

 
SOLVED
Go to solution
FREDERIC COLLIN
Advisor

Interconnect Failure after OA Firmware Upgrade

Hi, we have just upgraded the OA firmware from 2.32 to 2.60 (we have 2 OAs in a c7000) and we lost ALL communication to the backplane somehow after the firmware upgrade (4x Cisco 3120G and 2x Brocade SAN SW), this has caused a major panic in our data center and major downtime (over 100 virtual servers down without notice).

Unfortunately or not this is the second time this happens in 48 hours and we resolved it last sunday by pulling out all the switches from the back and then reseating one per minute. Everything came back online.

Fearing it was related to the OA version I went ahead and upgraded OA to version 3.0 (this one went in without any problems like usual), but now I have management breathing on my neck and preventing me to go to OA 3.11 which is labeled critical.

Has any one experienced total backplane communication problems like that because of a OA FW update? Will staying on OA 3.0 for some time not cause more problems ?

Thanks.
12 REPLIES 12
juan quesada
Respected Contributor

Re: Interconnect Failure after OA Firmware Upgrade

i have if your components inside of the blade system are very outdated, sample:
OA 3.11 with BIOS-ILO of servers from 2008.

regards,
FREDERIC COLLIN
Advisor

Re: Interconnect Failure after OA Firmware Upgrade

Thanks, but actually we had just upgraded all the BIOS and ILO firmwares prior to upgrading the OA.

What I have just discovered is that our network team had put in a new Cisco firmware in one of our 4 3120G switches, they have told me the firmware is OK even though it was downloaded direct from Cisco, could this be the root cause ? I find it strange that one of the switches could somehow bring down the entire backplane down, they are all communicating with the OA on an internal ethernet port but to bring everything down crashing is somewhat surprising.
Jan Soska
Honored Contributor

Re: Interconnect Failure after OA Firmware Upgrade

Hello,
very strange, we did not notice such behavior in our whole history (3,5 years on c7000) with brocada san's and cisco 3020's .
But - I have never jumped more than one OA firmware generation, trying to keep it on most current level (1 month after release...).

What says HP support?

Jan
FREDERIC COLLIN
Advisor

Re: Interconnect Failure after OA Firmware Upgrade

I was instructed by HP support to upgrade to OA 3.11 and then call back if the problem would present itself again.

We tend to be somewhat lousy at putting firmware update soon after release since it involves powering down and moving a lot of virtual servers around and it is somewhat time consuming especially with having to boot from a virtual CD with ILO for all ESX servers, unfortunately updating BIOSes is more work with ESX compared to windows platforms. At 30+ blades it is about one week full time of work and many machines have to be done at night too. That is why is is faster to update after a few releases and then we tend to stick with such a version for a little while unless we notice it is not 'bulletproof' and then the cycle starts all over again.
Fred Dy
New Member

Re: Interconnect Failure after OA Firmware Upgrade

Have you upgraded your Virtual Connect firmware as well? Generally upgrading the OA firmware needs a corresponding Virtual Connect firmware upgrade, unless the Matrix compatibility guide says that there will be no issues with the versions you are running with.
gregersenj
Honored Contributor
Solution

Re: Interconnect Failure after OA Firmware Upgrade

Well, it's not the midplane that fail.
The midplane got no active components.

But communication can fail between the devices, could be OA etc.

I was reading some release notes last week.
And I think, I did see something like that.
Tried to re-find it, but no success.

Readind the release notes / fixes. 3.00 should be OK to stay on 3.00.

You should check the compatibility matrix. To ensure, that things work together properly.

And yes, there's many points of view, wehter you should use - If it broke, don't fix it. Or upgrade uncritically.

I prefer to check the release notes, and make up my mind, from that.

ps. You can upgrade mulitble servers, from the same image, simutainously. But you do need to shutdown/move several virtual servers.
Beware non-virtual servers also need to reboot to enable new FW. Not all FW though.

BR
/jag

Accept or Kudo

FREDERIC COLLIN
Advisor

Re: Interconnect Failure after OA Firmware Upgrade

RE: Fred Dy

We do not have any Virtual Connect in this c7000, we have 4 Cisco 3120G and 2 Brocade SAN switches only.


RE: gregersenj

I know the midplane is all passive traces, but there could still be some problems with the actual signals between the OAs and the various networking and SAN equipment connecting through the midplane. I think it's kind of a freak accident that happened twice in 48 hours but that we've never seen before in the 2 years we've had the c7000.

As for upgrading multiple servers at the same time, yes it's a good idea we'll try to negotiate some down time, right now we have a 30 minute windows each month which is not enough.
gregersenj
Honored Contributor

Re: Interconnect Failure after OA Firmware Upgrade

As I wrote.

I have read, something similar, and I think it was in some release notes, but I can't remember, and I can't find it.

My guess is, that the reboot of the OA, after the FW upgrade has caursed the switches to reboot also.
That's my Guess, and I could be very wrong.

I would consider to upgrade switches also.
Or at least follow the support matrix.

You might need some service windows for that.

BR
/jag

Accept or Kudo

cjb_1
Trusted Contributor

Re: Interconnect Failure after OA Firmware Upgrade

Nice standard response regarding upgrading firmware. There is a fix in v3.11 that helps with these comm's issues but there is a workaround for v3.00. Details are in the release notes.

In answer to your question i think a number have people have seen issues like yours. Check out the thread in this forum from the guy who has lost his san!!