Switches, Hubs, and Modems
cancel
Showing results for 
Search instead for 
Did you mean: 

vMotion timeout with new "Switch Redundancy" implementation: vSphere4.1, EMC SAN

Michael Weinbergs
Occasional Visitor

vMotion timeout with new "Switch Redundancy" implementation: vSphere4.1, EMC SAN

hello everyone..
I have a problem that when I move servers from one vmware host to another (via vMotion) the servers are off the air for 30-60 seconds causing downtime/timeouts. This happens when the server has moved to a new switch we have that is trunked.
I suspect this has something to do with the "edge port" settings in the switch - but aside from having spanning tree on - not sure what I have to set to make the treecache to switch out faster..

I have the following config:
2x Procurve2510G-24.
2x IBM vSphere4 hosts (with 6 interfaces each)
1x EMC NX4 SAN.
(PDF attached)

Originally only connected with 1 switch - bought another for redundancy - now I am trying to configure them up - however, when I vMotion a Virtual machine from one switch to another - the IP address goes to the bitbucket - until (it appears) 30 seconds later when the switches learn they have moved.

The config:
Spanning Tree ON

Vlan (default) for all ports
Vlan 726 (iSCSI) on ports 1-3 (esx1) & and ports 5-7 (esx2), and ports 11-14 (NX4).
(there are other vlans.. but I will leave these for now)
I have trunked ports 17-20 (as trk1)

** This is all working OK.. EXCEPT when I move the VM's from one switch to another.

I suspect that this is related to some type of "fastport"/"edgeport" type setup.. (foreign to me... but read it in a forum somewhere.. )
can someone give me som advice on how I can get the switch to forget arps "quickly" so in the event of a switch failure - the other one kicks in quickly? (I'm happy with a couple of seconds downtime... but 30 seconds and we are getting timeouts)
3 REPLIES
Tore Valberg
Trusted Contributor

Re: vMotion timeout with new "Switch Redundancy" implementation: vSphere4.1, EMC SAN

Hi Michael

Check if the ports in question are set to admin-edge. This will prevent the port from participating in STP calculations.

Command example
"spanning-tree 4 admin-edge-port"

Tore
Richard Brodie_1
Honored Contributor

Re: vMotion timeout with new "Switch Redundancy" implementation: vSphere4.1, EMC SAN

Unless you are running old fashioned STP, 30s is way too long a timer. 2510s are nonrouting, so probably the MAC cache not the ARP cache.

This article appears to confirm:
http://www.dedicatedit.com/blog/very-technical-how-to/xenmotion-hp-procurve-switch-mac/

If you're running firmware older than Y.11.08, looks like your problem.
Michael Weinbergs
Occasional Visitor

Re: vMotion timeout with new "Switch Redundancy" implementation: vSphere4.1, EMC SAN

Thanks for teh tip on the revisions... didn't think of that... I'l upgrade that switch as soon as I can organise downtime (hopefully this weekend) and see if that fixes the problem.

The link you provided clarified a LOT of things... thanks very much.

I am also very interested in the spanning-tree 4 edge-port command.. will do further investigation.

Thanks everyone... will test those and see what happens!
Mike