BladeSystem - General
1825762 Members
2151 Online
109687 Solutions
New Discussion

OA firmware upgrade disrupts fibre channel

 
Andrew C. Brown
Occasional Advisor

OA firmware upgrade disrupts fibre channel

Yesterday I was upgrading the iLo firmware and then the OA firmware in a c7000 blade enclosure. I upgraded the OA firmware using the browser interface, so the standby and active OA's were upgraded one right after another.

I had thought this process was supposed to be non-disruptive to the blade servers, but in each of the (Suse Enterprise 10) servers' logs was the following set of messages:

Nov 26 16:28:46 w3b7 kernel: lpfc 0000:10:00.0: 0:1305 Link Down Event x2 received Data: x2 x20 x110
Nov 26 16:28:46 w3b7 kernel: lpfc 0000:10:00.0: 0:1303 Link Up Event x3 received Data: x3 x0 x10 x2
Nov 26 16:28:46 w3b7 kernel: lpfc 0000:10:00.1: 1:1305 Link Down Event x2 received Data: x2 x20 x110
Nov 26 16:28:47 w3b7 kernel: lpfc 0000:10:00.0: 0:0203 Nodev timeout on WWPN 50:6:e:80:14:43:f0:64 NPort x11200 Data: x8 x7 x0
Nov 26 16:28:47 w3b7 kernel: lpfc 0000:10:00.1: 1:1303 Link Up Event x3 received Data: x3 x0 x10 x2
Nov 26 16:28:47 w3b7 kernel: lpfc 0000:10:00.1: 1:0203 Nodev timeout on WWPN 50:6:e:80:14:43:f0:74 NPort x11200 Data: x8 x7 x0

The timing is the same down to the second across every blade, and the time corresponds with the OA upgrade. Based on this, it appears that some aspect of the OA upgrade caused the Brocade 4/24 FC switches to drop their links to every server in the enclosure. Has anyone else seen anything like this? I've searched on the HP website but can't find any reference to it. It seems unbelievable to me that an OA upgrade would have this effect, but I can't come up with any other explanation.
20 REPLIES 20
James ~ Happy Dude
Honored Contributor

Re: OA firmware upgrade disrupts fibre channel

Hello Andrew,
What was the Version of the ilo & OA before ... & to which did u upgrade it to ?

Regards.
Andrew C. Brown
Occasional Advisor

Re: OA firmware upgrade disrupts fibre channel

We updated the iLo processors on the blades from 1.29 to 1.35. Once that was done, we updated the OA's from 1.30 to 2.02. It was during the latter upgrade that the problem occurred.
Sarah Nordstrom
Frequent Advisor

Re: OA firmware upgrade disrupts fibre channel

We just experienced the same event upgrading from 2.25 to 2.31 on a c7000, all blades in the chassis flapped their fibre links when the OS upgrade/reset was occurring.
Andrew C. Brown
Occasional Advisor

Re: OA firmware upgrade disrupts fibre channel

We've seen this on three separate occasions now, but still don't have an answer to the problem.
Adrian Clint
Honored Contributor

Re: OA firmware upgrade disrupts fibre channel

I've been told by HP that the OA firmware upgrade is non-intrusive so this is concerning.
Has anyone raised a support call for this?
Sarah Nordstrom
Frequent Advisor

Re: OA firmware upgrade disrupts fibre channel

I did. Their answer was "After the firmware update is done into the Onboard Administrator its necessary a quick reboot. It should take just some seconds." Not very comforting, since we had been told it would be non-disruptive beforehand.
Andrew C. Brown
Occasional Advisor

Re: OA firmware upgrade disrupts fibre channel

If that's what the tech told you it doesn't sound like they really grasp what is going on. Yes, like basically any other device with firmware a firmware upgrade to the OA requires a reboot to take effect.

However, the entire point of iLO is that it is an out-of-band management system. By definition, an out-of-band management system should never insert itself into production behavior unless you tell it to, i.e. you're using it to power cycle a locked-up server. If the tech thinks that it's normal for an OA reboot to cause FC links to flap on the production side, I would advise escalating ASAP.

I just spoke with an associate who's handling our current support ticket and he's basically getting nowhere. The first tech apparently tried to close the ticket by misrepresenting a manager's willingness to apply the latest firmware upgrade, and when it was reassigned to a new tech and kicked up to engineering they refused to even look at it as it was a "one-off event." As I stated before, this is not a one-off event for us.

Their current advice (predictably) is to update to the latest iLO, although they can't point to a specific bugfix in that version that will fix this issue. Since we've seen this happen four times on at least three different firmware versions, I'm skeptical that the latest iLO will fix a problem that's persisted so long across so many versions.
Adrian Clint
Honored Contributor

Re: OA firmware upgrade disrupts fibre channel

And I fail to see what good upgrading the iLO firmware will do when it seems to be and issue with the OA and a SAN Switch.
I've got some latest upgrades to do on some equipment in config this week.
I'll try and replicate the SAN link loss on this.
Andrew C. Brown
Occasional Advisor

Re: OA firmware upgrade disrupts fibre channel

It may be an interaction between the OA and a SAN switch, but it's important to note that we see no evidence that the SAN switch is doing anything unusual.

The SAN switch doesn't reboot, but it logs simultaneous link down/up messages from every active blade in the enclosure at the same time the the OA reboots. We don't see this every time, but we've seen it as part of a reboot on a firmware upgrade and we've seen it when the active OA crashed due to a firmware bug. It appears that the OA reboot is interrupting the connections between the HBA in the blade and the SAN switch in the back of the enclosure.

In the interest of full disclosure (and possibly eliminating a variable) we're using the Brocade 4Gb SAN switches in our enclosures. If Sarah is watching this thread and doesn't mind posting what model of SAN switch she's using it may be helpful (especially if it's a different type of SAN switch).

Sarah Nordstrom
Frequent Advisor

Re: OA firmware upgrade disrupts fibre channel

I am watching this thread, all of our SAN switches are Brocade 4Gb 4/24. We have one chassis with FC pass-thrus instead of SAN switches, but it's a critical production system so I can't risk upgrading the OA on it right now.
Andrew C. Brown
Occasional Advisor

Re: OA firmware upgrade disrupts fibre channel

Another possible factor: Most recently we had two enclosures exhibit this behavior about an hour or so apart. The enclosures were chained together via the link cable, which may explain how the problem jumped but is hardly a smoking gun.

Interestingly another enclosure in the same stack had no issues. Of note is the fact that the two affected enclosures are both OA hardware rev A1. None of our other enclosures have A1 rev OA's in both slots, although one enclosure has a rev A1 in the secondary slot.

Sarah: Regarding your production enclosure, I would try to upgrade it sooner rather than later. The reason for our 2.25 upgrade was an iLO bug that caused the OA's in the two aforementioned enclosures to spontaneously reboot, which in turn caused the FC issue. This bug is supposedly fixed in 2.25.

Of course I should point out that we haven't yet upgraded the iLO of the second affected enclosure, because when we upgraded the first enclosure to 2.25, the problem happened AGAIN. :P
Sarah Nordstrom
Frequent Advisor

Re: OA firmware upgrade disrupts fibre channel

The chassis I experienced it on has a hardware rev. A1 OA in the left slot, and a rev. A0 OA in the right slot. I've upgraded two chassis so far (one c7k and one c3k) but only one has SAN switches in it, and it experienced the issue. All ours are linked with the link cable. We have seen the issue you're talking about in 2.25 also (the OA spontaneously rebooting), but in our case it hasn't caused the FC flap issue (yet).
Sarah Nordstrom
Frequent Advisor

Re: OA firmware upgrade disrupts fibre channel

Actually, upon rereading your post it looks like I misread. We are already running 2.25 on all c-class chassis, and still experience the spontaneous reboot. We are upgrading to 2.31 to attempt to solve that as well as problems when linking c7000s with c3000s via the Enclosure-Link.
Sarah Nordstrom
Frequent Advisor

Re: OA firmware upgrade disrupts fibre channel

After going back and forth a few times all we're getting out of HP is that their techs are aware of this but don't know why it happens yet. We've been told to do all OA updates in maintenance windows now.
Andrew C. Brown
Occasional Advisor

Re: OA firmware upgrade disrupts fibre channel

Sarah,

Yes, I suspect HP is aware that our cases are related, as they told us that they don't know what's going on yet but that they knew of one other customer who was affected.

I know you're using the Brocade 4/24 switches, can you tell me what software version you're running on the problem enclosure(s)? We're at 5.3.0a right now, and I'm wondering if that might be a factor. Enclosures at another datacenter are running 6.1.0a and I don't believe they've been affected by this issue.
Sarah Nordstrom
Frequent Advisor

Re: OA firmware upgrade disrupts fibre channel

Ours is also at 5.3.0a.
Raghuarch
Honored Contributor

Re: OA firmware upgrade disrupts fibre channel

Sarah,

Can you try updating your SAN Firmware to 6.1.0 and check do you see the problem.
I think you can't go straight to 6.1.0 you may need to update to 6.0.1a before.
SAN Firmware may be the Problem.

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=3185341&prodTypeId=3709945&prodSeriesId=3185340&swLang=8&taskId=135&swEnvOID=54
Sarah Nordstrom
Frequent Advisor

Re: OA firmware upgrade disrupts fibre channel

The only other chassis I have available won't be able to be upgraded until at least March, unfortunately. I can't risk another production outage at this time.
Andrew C. Brown
Occasional Advisor

Re: OA firmware upgrade disrupts fibre channel

I can confirm that Brocade OS 5.3.0a is the issue. I successfully replicated the problem on a test enclosure here by downgrading the Brocade switches to 5.3.0a and then performing a firmware upgrade on the OA's to force a reboot. I did the upgrade/reboot twice and the link flaps occurred each time.

I then upgraded to 6.0.1a and repeated the test. No link flaps.
Sarah Nordstrom
Frequent Advisor

Re: OA firmware upgrade disrupts fibre channel

Thanks for the information! I'll make sure to bring the SAN switches up to date before my next OA update.