HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
BladeSystem - General
cancel
Showing results for 
Search instead for 
Did you mean: 

Network Dropouts on BL460c NIC teams running W2k3

 
neilmc
Occasional Advisor

Network Dropouts on BL460c NIC teams running W2k3

Hi,

I am having an issue where about every 4 hours I get problems with routing
to bl460g6 blades.

The blades themselves can ping and connect to everything within their own vlan.
They can't ping the gateway and can't be contacted from outside their own vlan.
This happens to multiple blades at once, however ESX service consoles and Windows
2008 servers seem to be ok. All other 2ru servers are fine it is only the blades.
Error corrects itself after about 5 minutes, however, if you clear the arp cache on
the core switches it fixes it instantly.

c7000 chassis
2 x Flex 10 Virtual Connect using 1gb SFP's (upgraded firmware to latest, issue happened before and after firmware u/g)
connected to 2 x cisco 3750's in a stack running etherchannel with 4 vnets assigned. Vnet1 and Vnet4 are
presented to Windows servers in the profile. There are 8 nics visible in windows but only 2 show
as having media connected.

No port errors on 3750 stack or on core switches.
Error logs on blades show media disconnected errors at the same time issue occurs.
Issue occurs with Smartlink enabled and disabled.

Have updated drivers and Network Configuration Utility versions on blades

Any help would be greatly appreciated.
10 REPLIES
HEM_2
Honored Contributor

Re: Network Dropouts on BL460c NIC teams running W2k3

when the problem occurs, what does the core switch ARP cache look like before clearing it? Is it pointing to the secondary MAC Addresses of the teams or to something else entirely?

What teaming mode?

Is TCP Offload Engine turned off on the NICs?
neilmc
Occasional Advisor

Re: Network Dropouts on BL460c NIC teams running W2k3

Team mode is Network Fault Tolerance
TOE is on
ARP cache is pointing to the team mac address.

I should probably elaborate more. The core switches are a pair of Cisco 6509 that can't do etherchannel across them. The blades were plugged directly into the core switches until this issue started. Thought it may have been a vnet/etherchannel issue so we put a pair of 3750's between blades and core. Issue continues.
HEM_2
Honored Contributor

Re: Network Dropouts on BL460c NIC teams running W2k3

I would recommend turning off TOE.

Too many problems with it...
neilmc
Occasional Advisor

Re: Network Dropouts on BL460c NIC teams running W2k3

Toe is off, problem continues. Appears to happen on the hour at 10am 12 am 4pm and some other random times always on the hour.
HEM_2
Honored Contributor

Re: Network Dropouts on BL460c NIC teams running W2k3

problem lasts for about 5 minutes...

that is the default MAC Address aging time.

I think maybe the Team MAC Address is being sent out the secondary NIC of the team and then the Switch Address Tables point to the wrong NIC (secondary).

What is your NCU version?

If you temporarily disable the secondary NIC does the issue not occur?
neilmc
Occasional Advisor

Re: Network Dropouts on BL460c NIC teams running W2k3

Can't check exact version as I am on a course but I upgraded it to the latest.
We have dropped a blade down to one NIC on one vnet, not teamed and the issue still occurs. Have forced failover with NCU to either NIC in teams on the others and issue still occurs.


srussell
Advisor

Re: Network Dropouts on BL460c NIC teams running W2k3

Do these servers have NC532i adapters? Is the non paged memory quite high around 180+
If so you should ensure that the nics are to the latest drivers and Firmware and check to see that the Virtual Bus Drivers are also at the most current driver level. I would also disable RSS (recieve side scaling) from the HP NCU.
neilmc
Occasional Advisor

Re: Network Dropouts on BL460c NIC teams running W2k3

Looks like it was a configuration issue in the network. We are running a nat for a test environment that uses production addresses behind the NAT. Looks like in running the tagged vlan traffic past the windows 2003 blades with 2 different MAC addresses is the issue. It picks up the MAC of the router running the NAT about every 4 hours for some reason and then once the arp cache times out it reverts back to the correct mac for the production IP. Still haven't found the root cause but going to do a VLAN allowed list for now until we work out whats going on. Thanks for all your helpful suggestions.

HEM_2
Honored Contributor

Re: Network Dropouts on BL460c NIC teams running W2k3

souns like proxyARP is enabled on your router...
neilmc
Occasional Advisor

Re: Network Dropouts on BL460c NIC teams running W2k3

nope. Proxy ARP is not turned on. Looks like the windows 2003 tcp/ip stack is picking up tagged traffic from the pre-prod vlan on the blades. The flex 10 switch might be stripping the tag and picking up the mac address for the wrong vlan and passing that to the arp table on the blade. not sure. not seeing the problem anywhere else on the network, then again we aren't passing tagged traffic past any other nics except ESX hosts.