BladeSystem - General
cancel
Showing results for 
Search instead for 
Did you mean: 

OA firmware upgrade disrupts fibre channel

 
Andrew C. Brown
Occasional Advisor

OA firmware upgrade disrupts fibre channel

Yesterday I was upgrading the iLo firmware and then the OA firmware in a c7000 blade enclosure. I upgraded the OA firmware using the browser interface, so the standby and active OA's were upgraded one right after another.

I had thought this process was supposed to be non-disruptive to the blade servers, but in each of the (Suse Enterprise 10) servers' logs was the following set of messages:

Nov 26 16:28:46 w3b7 kernel: lpfc 0000:10:00.0: 0:1305 Link Down Event x2 received Data: x2 x20 x110
Nov 26 16:28:46 w3b7 kernel: lpfc 0000:10:00.0: 0:1303 Link Up Event x3 received Data: x3 x0 x10 x2
Nov 26 16:28:46 w3b7 kernel: lpfc 0000:10:00.1: 1:1305 Link Down Event x2 received Data: x2 x20 x110
Nov 26 16:28:47 w3b7 kernel: lpfc 0000:10:00.0: 0:0203 Nodev timeout on WWPN 50:6:e:80:14:43:f0:64 NPort x11200 Data: x8 x7 x0
Nov 26 16:28:47 w3b7 kernel: lpfc 0000:10:00.1: 1:1303 Link Up Event x3 received Data: x3 x0 x10 x2
Nov 26 16:28:47 w3b7 kernel: lpfc 0000:10:00.1: 1:0203 Nodev timeout on WWPN 50:6:e:80:14:43:f0:74 NPort x11200 Data: x8 x7 x0

The timing is the same down to the second across every blade, and the time corresponds with the OA upgrade. Based on this, it appears that some aspect of the OA upgrade caused the Brocade 4/24 FC switches to drop their links to every server in the enclosure. Has anyone else seen anything like this? I've searched on the HP website but can't find any reference to it. It seems unbelievable to me that an OA upgrade would have this effect, but I can't come up with any other explanation.
20 REPLIES 20
James ~ Happy Dude
Honored Contributor

Re: OA firmware upgrade disrupts fibre channel

Hello Andrew,
What was the Version of the ilo & OA before ... & to which did u upgrade it to ?

Regards.
Andrew C. Brown
Occasional Advisor

Re: OA firmware upgrade disrupts fibre channel

We updated the iLo processors on the blades from 1.29 to 1.35. Once that was done, we updated the OA's from 1.30 to 2.02. It was during the latter upgrade that the problem occurred.
Sarah Nordstrom
Frequent Advisor

Re: OA firmware upgrade disrupts fibre channel

We just experienced the same event upgrading from 2.25 to 2.31 on a c7000, all blades in the chassis flapped their fibre links when the OS upgrade/reset was occurring.
Andrew C. Brown
Occasional Advisor

Re: OA firmware upgrade disrupts fibre channel

We've seen this on three separate occasions now, but still don't have an answer to the problem.
Adrian Clint
Honored Contributor

Re: OA firmware upgrade disrupts fibre channel

I've been told by HP that the OA firmware upgrade is non-intrusive so this is concerning.
Has anyone raised a support call for this?
Sarah Nordstrom
Frequent Advisor

Re: OA firmware upgrade disrupts fibre channel

I did. Their answer was "After the firmware update is done into the Onboard Administrator its necessary a quick reboot. It should take just some seconds." Not very comforting, since we had been told it would be non-disruptive beforehand.
Andrew C. Brown
Occasional Advisor

Re: OA firmware upgrade disrupts fibre channel

If that's what the tech told you it doesn't sound like they really grasp what is going on. Yes, like basically any other device with firmware a firmware upgrade to the OA requires a reboot to take effect.

However, the entire point of iLO is that it is an out-of-band management system. By definition, an out-of-band management system should never insert itself into production behavior unless you tell it to, i.e. you're using it to power cycle a locked-up server. If the tech thinks that it's normal for an OA reboot to cause FC links to flap on the production side, I would advise escalating ASAP.

I just spoke with an associate who's handling our current support ticket and he's basically getting nowhere. The first tech apparently tried to close the ticket by misrepresenting a manager's willingness to apply the latest firmware upgrade, and when it was reassigned to a new tech and kicked up to engineering they refused to even look at it as it was a "one-off event." As I stated before, this is not a one-off event for us.

Their current advice (predictably) is to update to the latest iLO, although they can't point to a specific bugfix in that version that will fix this issue. Since we've seen this happen four times on at least three different firmware versions, I'm skeptical that the latest iLO will fix a problem that's persisted so long across so many versions.
Adrian Clint
Honored Contributor

Re: OA firmware upgrade disrupts fibre channel

And I fail to see what good upgrading the iLO firmware will do when it seems to be and issue with the OA and a SAN Switch.
I've got some latest upgrades to do on some equipment in config this week.
I'll try and replicate the SAN link loss on this.
Andrew C. Brown
Occasional Advisor

Re: OA firmware upgrade disrupts fibre channel

It may be an interaction between the OA and a SAN switch, but it's important to note that we see no evidence that the SAN switch is doing anything unusual.

The SAN switch doesn't reboot, but it logs simultaneous link down/up messages from every active blade in the enclosure at the same time the the OA reboots. We don't see this every time, but we've seen it as part of a reboot on a firmware upgrade and we've seen it when the active OA crashed due to a firmware bug. It appears that the OA reboot is interrupting the connections between the HBA in the blade and the SAN switch in the back of the enclosure.

In the interest of full disclosure (and possibly eliminating a variable) we're using the Brocade 4Gb SAN switches in our enclosures. If Sarah is watching this thread and doesn't mind posting what model of SAN switch she's using it may be helpful (especially if it's a different type of SAN switch).