BladeSystem - General
Showing results for 
Search instead for 
Did you mean: 

Occassional network dropouts between specific machines

Occasional Contributor

Occassional network dropouts between specific machines

We have a c7000 chassis using Virtual Connect interconnect modules. Three of the blades are a vSphere 4 cluster, and the rest are Windows 2008. The issue we are experiencing is that when we vmotion virtual machines in the cluster, they will usually randomly lose connectivity to ONE of the physical blades, but retain connectivity to the rest of the blades and the outside world. All servers are on the same IP subnet, so the traffic does not leave the Virtual Connect modules. I can start 4 or 5 ping sessions in one of the virtual machines to different physical servers on the same subnet, then vmotion the virtual machine, and quite consistently, at random, one of the ping sessions will change to "destination host unreachable" but the rest will continue fine. The only thing that will resolve the issue is disconnecting and re-connecting the virtual NICs in the virtual machine.

I've put Wireshark on the source and destination servers while the "destination host is unreachable" was happening, and determined the issue to be MAC / ARP related. I can see the ARP request leaving the VM, arriving at the physical host, see the physical host reply leaving the physical host, but don't see the reply arriving back to the VM.

The vSphere machines are doing NIC teaming (no etherchannel, no trunking, no vlan tagging), one NIC in each team on each of the Virtual Connect modules. Each virtual connect module has a 10GB link back to the core that is in it's own network profile.

We have another c7000 chassis with a VMware cluster that is not exhibiting the problem. The configurations are almost identical except the other configuration is using vlan tagging.

We are not at the latest Virtual connect firmware yet, we are hoping to get that updated this weekend, but I'm hesitant to think that will be the solution when we have another chassis at the same firmware level not exhibiting the problem.

We have turned TOE off on all machines involved, that has not solved the issue. Upgrading VM tools, hardware, drivers etc, has not fixed the problem either.

Anyone else seen this issue?