BladeSystem - General
1752551 Members
4416 Online
108788 Solutions
New Discussion юеВ

Re: Network failover not working with ESXi5, hp Virtual Connect

 
Mike O.
Regular Advisor

Network failover not working with ESXi5, hp Virtual Connect

We have C7000 with a mix of BL465G5 and BL465G7 blades.  We're using the 1/10 Virtual Connect Ethernet interconnect modules (no Flex 10).  Our external switches are Cisco 6509 units.  The blades are currently running VMWare ESXi 5.

 

We've configured the systems using the shared uplink set and VLan tagging (basically the scenario 1:5 in the HP Virtual Connect Ethernet cookbook).  The uplink set is spanning across two interconnect modules, with each interconnect going to an LACP group on a different 6509 switch, so it has some ports active and others in standby.  Except for VMWare updates and hardware firmware updates, this configuration has been pretty much unchanged for a couple of years.

 

Recently when we were doing some other testing, we disconnected three of the four interconnects from the network and the VMWare blade dropped off the network.  One interconnect was still connected, so the blades should have stayed connected.

 

After a bunch of testing, configuration changes, etc. we've determined that it appears that the VMWare NIC's aren't failing over when they should.  We have "beacon probing" enabled on the vSwitch, but it doesnтАЩt seem like it's detecting that there's no communication to the outside.

 

The blades all have four NIC's on them.  Our Virtual Connect configuration has two "Uplink Sets" defined with the same VLAN's in each.  Two NIC's are going to each uplink set.

 

All four physical vmnic's are attached to the vSwitch.  Vmnic0 and vmnic1 are going to "uplink set 1" and vmnic2 and vmnic3 are going to "uplink set 2"

The vmware "Nic Teaming" failover order is set to vmnic0, vmnic1, vmnic2, and vmnic3.  For testing we set the "load balancing" to "Explicit failover order".

 

If "Uplink set 1" has a connection to the outside world, everything works OK.  If we disconnect the connections on Uplink set 1, and only have Uplink set 2 working, VMWare doesn't detect that vmnic0 and vmnic1 can't communicate and we lose network connectivity.

 

If we change the vmnic failover order to have vmnic2 or vmnic3 first in the list, then it only works if Uplink Set 2 is connected.

 

 

We're going to open a ticket with VMWare, but I was hoping someone out there has had a similar configuration and might have some insight..

 

Mike O.

13 REPLIES 13
Hongjun Ma
Trusted Contributor

Re: Network failover not working with ESXi5, hp Virtual Connect

Hi Mike,

 

What's the version for VC 1/10 module?

 

also, could you post screen captures for the following

 

1) SUS1 config

2) SUS2 config

3) a server profile config(main screen and Multiple Networks screen)

4) Stacking link status

 

BTW, with Active/standby design, when active links in SUS1 fail, the server traffic should go across stacking link to use new active uplinks to go out. This is done without doing NIC failover.

 

Please take a look at page 23 of this doc

http://hongjunma.wordpress.com/2011/11/28/hp-virtual-connect-technical-overview-presentation/

 

My VC blog: http://hongjunma.wordpress.com



Mike O.
Regular Advisor

Re: Network failover not working with ESXi5, hp Virtual Connect


@Hongjun Ma wrote:

Hi Mike,

 

What's the version for VC 1/10 module?

 

also, could you post screen captures for the following

 

1) SUS1 config

2) SUS2 config

3) a server profile config(main screen and Multiple Networks screen)

4) Stacking link status

 

BTW, with Active/standby design, when active links in SUS1 fail, the server traffic should go across stacking link to use new active uplinks to go out. This is done without doing NIC failover.

 

Please take a look at page 23 of this doc

http://hongjunma.wordpress.com/2011/11/28/hp-virtual-connect-technical-overview-presentation/

 


I don't have the screen shots available, but here's some info:

 

- We're running 3.18 of the VC firmware.

 

1) SUS1 has bay1 ports 1, 2, 3, & 4 (LACP group to one switch) and bay 2 ports 1&2 (LACP group to a different switch).   About 8 vlan networks defined in the SUS

 

2) SUS2 has bay 5 ports 1, 2, 3 & 4 (LACP group to one switch) and bay 6 ports 1&2 (LACP group to a different switch).  Same VLan networks as defined for SUS1

 

 

3) LOM1 and LOM2 going to SUS1,   Mezz1 and Mezz2 going to SUS2

 

VMWare team has all four vmnics.  Load balancing set to port ID, failover detection set to "beacon".

 

 

4) We have ethernet interconnects in bays 1, 2, 5, and 6.  We have cx4 linking 1&5 and another one between 2 & 6.  All vertical and horizontal stacking links are showing OK.

 

-Within the uplink set (where some ports are active and some standby), we do get the the correct function when the active ones go down; the standby ones go live and the traffic flows without almost no interruption.  What  we're missing is when all the ports in an uplink set (both active and passive) go down, VMWare doesn't pick up that vmnic0 and vmnic1 aren't getting outside, so it continues to use those vmnics instead of failing to the other vmnics in the team that are going to a different uplink.

 

Psychonaut
Respected Contributor

Re: Network failover not working with ESXi5, hp Virtual Connect

Do you have Smart Link enabled on the SUS's?
Hongjun Ma
Trusted Contributor

Re: Network failover not working with ESXi5, hp Virtual Connect

Hi Mike,

 

Your last statement helped me to better understand better about your problem. I was not clear you refer to the situation that you lose all uplinks for a SUS.

 

I think here is your problem when you use "beacon" along with this topology of 4 vc modules stacking. Let's say you lose all of your uplinks for SUS1 in module 1 and 2. Because your stacking topology(which is setup correctly), the beacon heartbeat from server will send to VC1 and see stacking link to VC5 so it'll get forwarded to VC5 and then VC6 through internal horizontal stacking link. From VC6, it'll use vertical stacking link again to flow back to VC2 and back to your vmnics. Remember, all modules and stacking links will carry any vnet you defined even though you don't have any uplink defined on this module.

 

What's the reason you use "beacon"? Why can't you use "link status" detection on Vmware side?

 

This is assuming that you DON'T have "smartlink" enabled for all vnets, which is what cookbook 1:5 is configured.

 

try define "smartlink" for all vnets to see if it works with beacon and link status detection. It should work. The function of "smartlink" is to shut down all downlink ports to server if the given vnet loses ALL of its uplinks.

 

 

My VC blog: http://hongjunma.wordpress.com



Mike O.
Regular Advisor

Re: Network failover not working with ESXi5, hp Virtual Connect


@Hongjun Ma wrote:

Hi Mike,

 

Your last statement helped me to better understand better about your problem. I was not clear you refer to the situation that you lose all uplinks for a SUS.

 

I think here is your problem when you use "beacon" along with this topology of 4 vc modules stacking. Let's say you lose all of your uplinks for SUS1 in module 1 and 2. Because your stacking topology(which is setup correctly), the beacon heartbeat from server will send to VC1 and see stacking link to VC5 so it'll get forwarded to VC5 and then VC6 through internal horizontal stacking link. From VC6, it'll use vertical stacking link again to flow back to VC2 and back to your vmnics. Remember, all modules and stacking links will carry any vnet you defined even though you don't have any uplink defined on this module.

 

What's the reason you use "beacon"? Why can't you use "link status" detection on Vmware side?

 

This is assuming that you DON'T have "smartlink" enabled for all vnets, which is what cookbook 1:5 is configured.

 

try define "smartlink" for all vnets to see if it works with beacon and link status detection. It should work. The function of "smartlink" is to shut down all downlink ports to server if the given vnet loses ALL of its uplinks.

 

 


Actually, I was just about to post some more info.  This morning, we tried various combinations of uplink sets, smartlink, and/or beacon probing.

 

What we found out was pretty much just what you said; that the beacon probing wasn't working when we had the uplinks spanning interconnects, and we figured out that it was because of the stacking links.  It did work if we had each uplink set isolated to a single interconnect bay.

 

We also tried enabling smartlink on the networks in the uplink set, with the uplink set spanning multiple interconnects (with the active/standby ports).  This worked perfectly and did exactly what we wanted it to do.   If the active ports went down, the standby ones came up and everything worked OK.  We would lose one "ping", but VMWare didn't mark any nics as down.    When we remove all the ports from the uplink, VMWare sees both nics down and does it's teaming to send the traffic over the other nics (attached the the other uplink set).  The "outage" is a little bit longer (two or three PING responses), but certainly acceptable.

 

So with Smartlink enabled, we have the full redundancy; as long as we have at least one connection to any of the interconnect modules, we can get network traffic to the VMWare environment.

 

 

What I'm wondering about now is why in the VC Cookbook, under scenario 1:5, it specifically says that "Smartlink should NOT be enabled".  I understand that in a "horizontal" failover with active/standby ports, Smartlink wouldn't be needed, but is there a problem with having it enabled?

 

Since having smartlink enabled seems to solve our issue, and provide us the most redundancy, I'd like to leave it enabled, but I don't want to cause any other issues...

 

 

 

Mike O.

 

By the way, the reason we had been using "beacon" instead of "link status" was from the cookbook; it shows beacon in the ESX configuration section of scenario 1:5.  That also seemed logical, since with Smartlink disabled (per the cookbook), it seemed like we would never have a link failure.

 


 

Mike O.
Regular Advisor

Re: Network failover not working with ESXi5, hp Virtual Connect


@Psychonaut wrote:
Do you have Smart Link enabled on the SUS's?

We did not, per the VC Cookbook scenario 1:5.  However, as part of our testing today we did enable it and the failovers work exactly as we want them to (see my other response).  I'm still not sure why the cookbook specifically says "Smartlink should NOT be enabled".  I can see that it wouldn't help in a failover with the active/standby ports, but will having it enabled cause any problems?

 

Mike O.

Hongjun Ma
Trusted Contributor

Re: Network failover not working with ESXi5, hp Virtual Connect

Hi Mike,

 

Please keep "Smartlink" on, it won't do any harm. I'll say most of VC deployment should have smartlink enabled to make sure we are not blackholing the traffic.

 

One instance that you don't want to use "smartlink" is only when you have some internal communications across blades and you still want to have server NICs up even when all uplinks go down. Some scenarios like cluster configuration that you don't want to trigger host failover. But in your topology you should enable smartlink.

 

Also, Try to set your vswith failover to "link status'. this may give you quicker failover time because you don't have to wait multiple times of beacon heartbeat missing before triggering failover. "link status" is default and that should just work fine.

 

I believe the reason VC cookbook uses "beacon" is because that back in some early time, Smartlink feature doesn't work consistently on some NIC firmware versions. Nowadays with latest firmware/driver, smartlink will work well and then you should leave NIC side as "link status" failover.

 

Take a look at VC Flexfabric cookbook, which is latest VC module. You can see in Scenario 5, the "link status" is being used by vswitch. Forget about FCOE and FlexNIC part which doesn't apply to VC 1/10 module. Basic ethernet design and failover is the same.

http://h20000.www2.hp.com/bc/docs/support/SupportManual/c02616817/c02616817.pdf

 

My VC blog: http://hongjunma.wordpress.com



Mike O.
Regular Advisor

Re: Network failover not working with ESXi5, hp Virtual Connect

Thanks, that's what I was hoping to hear, that Smartlink wouldn't cause any problems.  I guess what was concerning me was they way the cookbook worded it, that "Smartlink should NOT be enabled", with "NOT" in all caps.   I didn't see how it would hurt, but they fact that they emphasized "NOT" made me wonder..

 

For the VMWare detection, once we re-enabled Smartlink we were going to go ahead an use the "link status" in VMWare instead of beaconing. 

 

Besides the issue with VC looping back the beacon packets, I can understand where beacon probing theoretically could help detecting upstream switch failures, but in our case our blade chassis is connected directly to our top level "core" switchs in our data center; there's no other "upstream" switch for the beaconing to detect.  If our core 6509 switch isn't talking to anything else, we have a whole lot more issues going on...

 

I have a copy of the Flexfabric cookbook, but I didn't really dig into much since we're not using the Flex-10 modules at this time.

 

Thanks again.

 

Mike O.

Psychonaut
Respected Contributor

Re: Network failover not working with ESXi5, hp Virtual Connect

I've got 12 servers running with Smartlink and "link status" - works great.