Re: Problems with GbE2c routing/ARP, upgraded to v2.0.60

BarnabyArnott · ‎10-22-2009

We have had two instances of problems where upgrading the GbE2c firmware has caused networking problems. We also have an existing issue with a third site that has upgraded through the firmware revisions and still has problems.

Case #1:
We had an issue with the blade switch operation back in June this year (something to do with the GUI interface not reporting correct information).
We decided to upgrade from 2.0.0 to 2.0.60, including boot code.
An hour after the upgrade, we were told of routing problem by another engineer. We discovered we couldn't connect to other branches on our OneOffice network. We fixed this by reorganizing network connections to direct L2 switches.
But two days later, the problem recurred after rebooting servers for patching.
The only solution was to restore the original boot code and firmware - just selecting the 2.0.0 image did NOT boot to that version as advertised.
I forget if we tried v2.0.50 during our troubleshooting (relevent to case #2)

All routing at our HQ branch is performed by this GbE2c.
Network segments include a LAN (trunked connection), OneOffice WAN, and internet and DMZ VLANs via a virtual firewall device.

Equipment:
Internal blade servers run a mix of physical Windows and ESX3.5 host servers.
c3000
OA f/w: 2.52
Single GbE2c switch config management IP hardset to LAN address, external ports serve different purposes, some ports used as a basic trunked to an external switch:
/c/port 19
dis
/c/port 20
pvid 200
/c/port 21
tag ena
/c/port 22
tag ena
/c/port 23
tag ena
/c/port 24
tag ena
/c/l3/rtrid 10.35.13.35
if 256 undefined - no EBIPA
Numerous VLAN L3 interfaces and static routes defined

Case #2:
I recently completed a complete firmware upgrade for a customer with a c3000 chassis and a GbE2c. The GbE2c was upgraded from 2.0.50 f/w and bootcode to 2.0.60 f/w and bootcode. This upgrade was performed at a separate time to any other changes, and immediately networking through the virtual firewall started to have problems.
Again details are a little fuzzy, but in both cases I recall a difference in traceroute results from LAN clients versus from a telnet session to the switch
The resolution was to select the 2.0.50 image (without reverting the 2.0.60 boot code) and rebooting. The switch rebooted successfully to f/w 2.0.50 and the routing issues were resolved.
I repeated the upgrade by selecting the 2.0.60 image and rebooted. This reproduced the problems. A subsequent downgrade fixed it again.

Equipment
Blade servers running ESX4 and SB40c storage blades
c3000
OA f/w: 2.60
Single GbE2c switch config management IP hardset to LAN address, external ports serve different purposes, some ports used as a basic trunked to an external switch:
/c/port 20
pvid 120
/c/port 20/gig
speed 1000
mode full
/c/port 21
tag ena
tagpvid dis
/c/port 21/gig
speed 1000
mode full
/c/port 22
tag ena
tagpvid dis
/c/port 22/gig
speed 1000
mode full
/c/port 23
tag ena
tagpvid dis
/c/port 23/gig
speed 1000
mode full
/c/port 24
tag ena
tagpvid dis
/c/port 24/gig
speed 1000
mode full
/c/l3/rtrid 192.168.1.15
/c/l3/if 256
dis
addr 192.168.1.100
No static routes defined but numerous VLAN L3 interfaces defined

Case #3
The third issue to report is a bit different - firmware has been upgraded to 2.0.60 OK; and the ports are all used as a single trunk to an L3 switch. No L3 routing is performed by the GbE2c's.
However there is an ongoing issue related to PXE booting: when NIC #1 is set to be the PXE boot target, it sometimes fails and the BIOS has be accessed to change to using NIC #2 for PXE boot.
Sometimes we also see networking failure when the W2K3 OS is built and the teaming software is applied.
We have found the only sure way to solve these issues is to pull out the blade server and reinsert it. The blade server BIOS, iLO, and PMC firmware is all at the latest levels.

Internal blade servers run a mix of physical Windows and ESX3.5 host servers.
2x c7000
Dual OAs f/w: 2.52
Six GbE2c switches per chassis, config management IP hardset to LAN address, external ports serve different purposes, some ports used as a basic trunked to an external switch:
/c/port 20
name "Uplink1 - Spare"
dis
tag ena
tagpvid dis
/c/port 20/gig
mode full
/c/port 21
name "Uplink2 to ITSX908"
tag ena
pvid 254
media/copper
/c/port 21/gig
mode full
/c/port 22
name "Uplink3 to ITSX908"
tag ena
pvid 254
media/copper
/c/port 22/gig
mode full
/c/port 23
name "Uplink4 to ITSX908"
tag ena
pvid 254
media/copper
/c/port 23/gig
mode full
/c/port 24
name "Uplink5 to ITSX908"
tag ena
pvid 254
media/copper
/c/port 24/gig
mode full
/c/l3/rtrid 172.16.254.20 (example of one of 12)
EBIPA address also assigned on a different subnet

In no cases was UFD enabled.
In all three cases, some external ports were trunked.
In all three cases, multiple VLANs
In both cases where routing failed after f/w upgrade to 2.0.60, some external ports are used for trunking, yet another port is assigned to a different VLAN.
There are differences in the management port assignments and how EBIPA is disabled, thus what could happen with STP, and also differences in how the switch is used in the network at L3, but in all cases the common element is the GbE2c switch, and especially f/w version 2.0.60

It appears there is a problem with the switch's VLAN traffic routing algorithm or ARP table management when upgrading to f/w v2.0.60. But perhaps it is a simple configuration issues. Can anyone spot the issue? Perhaps compare to similar working configurations. Or else confirm others are seeing the same problems.

Best regards, Barnaby

Stefan Wehinger · ‎01-18-2010

Hi Barnaby!

I know I'm a little late, but i still want to answer to you. We have a similar problem, currently using firmware 5.1.3 (I still don't get that version jump).

We have a bladecenter with a few blades hosting xenserver, about 30 different VLANs, most of which get routed through the switch. This works as is should except when a link on one of the blades changes - even for a short time. Or any other network link for that matter. The result is network packages looping through the network until they reach their TTL, after a few minutes everything gets back to normal.

The interesting (and probably new thing here) is, if you ping a few addresses directly from the switch console, those addresses work immediately, where others still need time (and a few minutes is just too much if a simple client goes down or looses the link...)

I'm gonna try to revert back to firmware 2.0.50 to see if we still have that problem if I do. Very annoying - have you learned anything new about this yet?

PS: Sry for my english, is 1:30am ;)

Stefan Wehinger · ‎01-18-2010

Update: Try disabling BOOTP relaying on all the interfaces - preliminary tests look very promising - now we just need to adapt our dhcp to have a foot in every vlan where possible...

BarnabyArnott · ‎02-04-2010

We also found testing pings directly from the switch console would work, where a client on the local subnet would fail. Clearly a bug in the firmware, but HP has not provided any valuable assistance. Being a live production environment, we can't afford to test so have kept our lower firmware level.
Interesting what you mention about disabling BOOTP - has this proven effective as a permanent workaround, or did you backout to an earlier firmware also?
Cheers, Barnaby

cresch · ‎01-12-2011

Was there ever any resolution on this? We appear to be experiencing a similar issue after upgrading to v5.1.3...

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Problems with GbE2c routing/ARP, upgraded to v2.0.60

Problems with GbE2c routing/ARP, upgraded to v2.0.60