LAN Routing
1745788 Members
3731 Online
108722 Solutions
New Discussion

problems in ARP/MAC table -- Procurve 8212&Vmware VMOTION

 
spice2003
Advisor

problems in ARP/MAC table -- Procurve 8212&Vmware VMOTION

We have procurve 8212 backbone and 3 Vmware ESXI-5 server in cluster mode, there is VMOTION function in Vmware, this function giving us the ability to transfer vm’s between hosts, for HA and LNB.

The problem is in our backbone arp table, because for example, we have linux vm on server01, server01 connected to the backbone in port E10, then VMware moves the VM to server02 and then the linux VM is now in port E12.

The process of vmotion and the VM transformation should take at MAX 1 seconds! We should lose 1 ping…in the reality it takes at least 10 second..

60% of the world working with VMware and al lot of them working with HP switches, It can be so long to make VMOTION transfer, the VM will disconcert from the storage and the users also..

 

If im doing the VMOTION transfer and Im writing “clear arp” in my backbone, the process goes fine! In 1 second. But without my interruption it take long time than expected…

 

I fond online this command “ip arp-mcast-replies” and I have entered it to my backbone, and it does helped but 50%..some time it take 1 second and some time 10 second..

Someone can help me with that?

I also changed the arp and mac tale age time to 60 seconds(the minimum..)

 

I am at a loss, please advise J

 

Thanks!!

Regards,
MCITP, CCNA, CCNP
7 REPLIES 7
paulgear
Esteemed Contributor

Re: problems in ARP/MAC table -- Procurve 8212&Vmware VMOTION

Hi spice2003,

When VMware vMotions a VM from one host to another, its MAC address migrates with it, therefore the MAC-to-IP mapping doesn't change. So ARP commands should not have any affect on your problem.

The only issue should be that of the MAC table in the switch having the MAC on the wrong port. If i remember correctly, VMware sends out a gratuitous ARP on behalf of the VM when it migrates it, so the switch's MAC table should update quickly as well.

In my environment, i usually see 3-4 dropped pings, but there's no issue with storage dropping out, because the ESXi host is what controls the storage, not the VM. What does your switch and VMware network configuration look like? Are you sharing storage and vMotion on the same NIC? How does your system connect to the storage?
Regards,
Paul
spice2003
Advisor

Re: problems in ARP/MAC table -- Procurve 8212&Vmware VMOTION

Hi

thank you very much for the answer.

 

we have 2 NIC's, 1 for vmotion and 1 for storage, and also 2 Vlans, 1 vlan for vmotion without any IP address, and the storage vlas with IP address,  if one of the NIC's failed, the second backing up. 

 

i wish i have 3-4 drops..i have 10 drops and sometimes even more and sometimes 1 drop, I do not know why it is not consistent? where is the problem, in my backbone or in my Vmware environment? 

when Vmotion transferred the vm, i can see the mac address going to the new port and i see it on the new port after 2 seconds with "show mac address command" but it still drops 8 pings after it already in the new port..

I'm really confused

 

thanks a lot

Regards,
MCITP, CCNA, CCNP
paulgear
Esteemed Contributor

Re: problems in ARP/MAC table -- Procurve 8212&Vmware VMOTION

Hi spice2003,

Do you have any monitoring of your switch that shows utilisation of each port? My guess is that your ESXi systems are using the storage network because there are no IP addresses defined on the vMotion VLAN. If i understand the VMware documentation correctly, vMotion requires a valid IP address on the vMotion NIC.
Regards,
Paul
spice2003
Advisor

Re: problems in ARP/MAC table -- Procurve 8212&Vmware VMOTION

I have procurve manager software, but is not real time, the refresh happening  every minute.

may I will check the traffic with tcpdump or something with port mirroring.

i also will add IP to vMotion NIC's and Vlan.

 

I forgot the mention very important thing, all ESX host are directly connected to my backbone procurve 8212,  during the vMotion process, the ping from the entire network to the VM  gets 10 drops but directly ping from the backbone to the vm not getting even 1 drop!

Can it point on something?

 

 

thanks a lot. 

Regards,
MCITP, CCNA, CCNP
Peter_Debruyne
Honored Contributor

Re: problems in ARP/MAC table -- Procurve 8212&Vmware VMOTION

Hi,

 

You are correct that the hardware (since it is both a hardware L2 switch and L3 switch) needs to update 2 hardware tables for this change:

* the L2 fw table (just based on the new source port of the source mac)

* the L3 fw table (learned based on arp)

 

So it seems the L3 arp update is not updated quickly enough (demonstrated by fast L2 failover, but slow L3 - routed) failover.

I am not aware of a provision command to enable/disable gratuitous arp messages, I always believed the provision just accepts the gratuitous arps (but than you should not have this problem of course).

 

Which version of code is running on the box ?

 

If possible, you can do this validation:

 

Introduce a server access switch, connect 2 esx boxes to this switch, then connect the server access to the 8200.

When the vmotion is done in this setup between the 2 esx boxes, the L3 next hop interface does not change for the 8200, so it should be fast.

When you vmotion to another box, the L3 next hop interface of the 8200 would change, so that would be slow.

 

Best regards,Peter

 

 

spice2003
Advisor

Re: problems in ARP/MAC table -- Procurve 8212&Vmware VMOTION

Hi

thanks for the answer

 

I tried your idea and it works great, i connected Vmware hosts to hp 2510 switch and then i connected this switch to the backbone to bypass the big arp/mac table of the backbone, But is just workaround for debug, i need my backbone to do that jub..

how i can fix that problem in my backbone? we have K.15.07.0008 version in the backbone.

it does not make sense that big HP backbone can't handle with this process :(

Regards,
MCITP, CCNA, CCNP
C0LDWiR3D
Frequent Advisor

Re: problems in ARP/MAC table -- Procurve 8212&Vmware VMOTION

Hi.

 

Did you manage to solve this ?

And identify the cause ?

 

Are you using distributed trunking (LAGs) ?