BladeSystem Server Blades
Showing results for 
Search instead for 
Did you mean: 

Multicast packet loss problem with ESXi 5.0 in a BL460c Gen8 cluster

Occasional Contributor

Multicast packet loss problem with ESXi 5.0 in a BL460c Gen8 cluster

Hi all,


I'm having a multicast packet loss problem with ESXi 5.0 in a BL460c Gen8 cluster.  I'm using HP's version of ESXi.  The problem seems to be related to the combination of ESXi 5.0 and the Gen8 hardware, here's what's happening:


With two VMs (RHEL 5.5 guest o/s) on the same server blade, a multicast subscriber on one VM succesfully receives packets from the sender on the other VM.  If I move one of the VMs to a different blade in the cluster, the multicast subscriber experiences huge packet loss and out-of-order packets.  I've verified this behavior with several software utilities that can send out and recieve multicast data and the result is the same.


Now, if I migrate the multicast sender VM to a BL460c G7 blade in the same chassis as the Gen8s, also running ESXI 5.0, the problem goes away.  If I switch the VMs, i.e. the multicast sender is now on the Gen8 and the receiver is on the G7, the problem returns.  It seems to me that the combination of ESXi 5.0 and the Gen8 is causing some kind of multicast packet send problem.  I've run this experiment on all pair combinations of the four blades in my cluster and the behavior is identical.


I have the latest NIC drivers installed on the Gen8 blade and the vSwitches and vNICS all seem to be configured correctly.  I'm using e1000 NICs on my VMs.  I have tried the VMXNET3 NICs, but the problem remains.


Has anyone else observed this behavior?


I'm guessing that when the VMs are located on the same blade, the outgoing multicast data is being turned around at the vSwitch, so never gets to the physical NIC and the physical network switch that my blade chassis connects to.  As soon as I separate the VMs, the data does reach the physical NICs, which is I'm thinking it's maybe a phyical NIC driver/ESXI 5.0 problem.


FYI, a few of my colleagues have multicast data flowing between two Gen8 blades (bare metal) and between two Gen8 blades running ESXI 4.1u2, so my problem is definitely peculiar to the Gen8/ESXI 5.0 combo.


Any suggestions on what might be the problem would be greatly appreciated.