- Community Home
- >
- Servers and Operating Systems
- >
- HPE BladeSystem
- >
- BladeSystem - General
- >
- Re: Network failover not working with ESXi5, hp Vi...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-18-2012 01:59 PM
01-18-2012 01:59 PM
Network failover not working with ESXi5, hp Virtual Connect
We have C7000 with a mix of BL465G5 and BL465G7 blades. We're using the 1/10 Virtual Connect Ethernet interconnect modules (no Flex 10). Our external switches are Cisco 6509 units. The blades are currently running VMWare ESXi 5.
We've configured the systems using the shared uplink set and VLan tagging (basically the scenario 1:5 in the HP Virtual Connect Ethernet cookbook). The uplink set is spanning across two interconnect modules, with each interconnect going to an LACP group on a different 6509 switch, so it has some ports active and others in standby. Except for VMWare updates and hardware firmware updates, this configuration has been pretty much unchanged for a couple of years.
Recently when we were doing some other testing, we disconnected three of the four interconnects from the network and the VMWare blade dropped off the network. One interconnect was still connected, so the blades should have stayed connected.
After a bunch of testing, configuration changes, etc. we've determined that it appears that the VMWare NIC's aren't failing over when they should. We have "beacon probing" enabled on the vSwitch, but it doesn’t seem like it's detecting that there's no communication to the outside.
The blades all have four NIC's on them. Our Virtual Connect configuration has two "Uplink Sets" defined with the same VLAN's in each. Two NIC's are going to each uplink set.
All four physical vmnic's are attached to the vSwitch. Vmnic0 and vmnic1 are going to "uplink set 1" and vmnic2 and vmnic3 are going to "uplink set 2"
The vmware "Nic Teaming" failover order is set to vmnic0, vmnic1, vmnic2, and vmnic3. For testing we set the "load balancing" to "Explicit failover order".
If "Uplink set 1" has a connection to the outside world, everything works OK. If we disconnect the connections on Uplink set 1, and only have Uplink set 2 working, VMWare doesn't detect that vmnic0 and vmnic1 can't communicate and we lose network connectivity.
If we change the vmnic failover order to have vmnic2 or vmnic3 first in the list, then it only works if Uplink Set 2 is connected.
We're going to open a ticket with VMWare, but I was hoping someone out there has had a similar configuration and might have some insight..
Mike O.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-18-2012 02:55 PM
01-18-2012 02:55 PM
Re: Network failover not working with ESXi5, hp Virtual Connect
Hi Mike,
What's the version for VC 1/10 module?
also, could you post screen captures for the following
1) SUS1 config
2) SUS2 config
3) a server profile config(main screen and Multiple Networks screen)
4) Stacking link status
BTW, with Active/standby design, when active links in SUS1 fail, the server traffic should go across stacking link to use new active uplinks to go out. This is done without doing NIC failover.
Please take a look at page 23 of this doc
http://hongjunma.wordpress.com/2011/11/28/hp-virtual-connect-technical-overview-presentation/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-18-2012 09:22 PM
01-18-2012 09:22 PM
Re: Network failover not working with ESXi5, hp Virtual Connect
@Hongjun Ma wrote:Hi Mike,
What's the version for VC 1/10 module?
also, could you post screen captures for the following
1) SUS1 config
2) SUS2 config
3) a server profile config(main screen and Multiple Networks screen)
4) Stacking link status
BTW, with Active/standby design, when active links in SUS1 fail, the server traffic should go across stacking link to use new active uplinks to go out. This is done without doing NIC failover.
Please take a look at page 23 of this doc
http://hongjunma.wordpress.com/2011/11/28/hp-virtual-connect-technical-overview-presentation/
I don't have the screen shots available, but here's some info:
- We're running 3.18 of the VC firmware.
1) SUS1 has bay1 ports 1, 2, 3, & 4 (LACP group to one switch) and bay 2 ports 1&2 (LACP group to a different switch). About 8 vlan networks defined in the SUS
2) SUS2 has bay 5 ports 1, 2, 3 & 4 (LACP group to one switch) and bay 6 ports 1&2 (LACP group to a different switch). Same VLan networks as defined for SUS1
3) LOM1 and LOM2 going to SUS1, Mezz1 and Mezz2 going to SUS2
VMWare team has all four vmnics. Load balancing set to port ID, failover detection set to "beacon".
4) We have ethernet interconnects in bays 1, 2, 5, and 6. We have cx4 linking 1&5 and another one between 2 & 6. All vertical and horizontal stacking links are showing OK.
-Within the uplink set (where some ports are active and some standby), we do get the the correct function when the active ones go down; the standby ones go live and the traffic flows without almost no interruption. What we're missing is when all the ports in an uplink set (both active and passive) go down, VMWare doesn't pick up that vmnic0 and vmnic1 aren't getting outside, so it continues to use those vmnics instead of failing to the other vmnics in the team that are going to a different uplink.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-19-2012 07:11 AM
01-19-2012 07:11 AM
Re: Network failover not working with ESXi5, hp Virtual Connect
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-19-2012 07:34 AM
01-19-2012 07:34 AM
Re: Network failover not working with ESXi5, hp Virtual Connect
Hi Mike,
Your last statement helped me to better understand better about your problem. I was not clear you refer to the situation that you lose all uplinks for a SUS.
I think here is your problem when you use "beacon" along with this topology of 4 vc modules stacking. Let's say you lose all of your uplinks for SUS1 in module 1 and 2. Because your stacking topology(which is setup correctly), the beacon heartbeat from server will send to VC1 and see stacking link to VC5 so it'll get forwarded to VC5 and then VC6 through internal horizontal stacking link. From VC6, it'll use vertical stacking link again to flow back to VC2 and back to your vmnics. Remember, all modules and stacking links will carry any vnet you defined even though you don't have any uplink defined on this module.
What's the reason you use "beacon"? Why can't you use "link status" detection on Vmware side?
This is assuming that you DON'T have "smartlink" enabled for all vnets, which is what cookbook 1:5 is configured.
try define "smartlink" for all vnets to see if it works with beacon and link status detection. It should work. The function of "smartlink" is to shut down all downlink ports to server if the given vnet loses ALL of its uplinks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-19-2012 08:43 AM
01-19-2012 08:43 AM
Re: Network failover not working with ESXi5, hp Virtual Connect
@Hongjun Ma wrote:Hi Mike,
Your last statement helped me to better understand better about your problem. I was not clear you refer to the situation that you lose all uplinks for a SUS.
I think here is your problem when you use "beacon" along with this topology of 4 vc modules stacking. Let's say you lose all of your uplinks for SUS1 in module 1 and 2. Because your stacking topology(which is setup correctly), the beacon heartbeat from server will send to VC1 and see stacking link to VC5 so it'll get forwarded to VC5 and then VC6 through internal horizontal stacking link. From VC6, it'll use vertical stacking link again to flow back to VC2 and back to your vmnics. Remember, all modules and stacking links will carry any vnet you defined even though you don't have any uplink defined on this module.
What's the reason you use "beacon"? Why can't you use "link status" detection on Vmware side?
This is assuming that you DON'T have "smartlink" enabled for all vnets, which is what cookbook 1:5 is configured.
try define "smartlink" for all vnets to see if it works with beacon and link status detection. It should work. The function of "smartlink" is to shut down all downlink ports to server if the given vnet loses ALL of its uplinks.
Actually, I was just about to post some more info. This morning, we tried various combinations of uplink sets, smartlink, and/or beacon probing.
What we found out was pretty much just what you said; that the beacon probing wasn't working when we had the uplinks spanning interconnects, and we figured out that it was because of the stacking links. It did work if we had each uplink set isolated to a single interconnect bay.
We also tried enabling smartlink on the networks in the uplink set, with the uplink set spanning multiple interconnects (with the active/standby ports). This worked perfectly and did exactly what we wanted it to do. If the active ports went down, the standby ones came up and everything worked OK. We would lose one "ping", but VMWare didn't mark any nics as down. When we remove all the ports from the uplink, VMWare sees both nics down and does it's teaming to send the traffic over the other nics (attached the the other uplink set). The "outage" is a little bit longer (two or three PING responses), but certainly acceptable.
So with Smartlink enabled, we have the full redundancy; as long as we have at least one connection to any of the interconnect modules, we can get network traffic to the VMWare environment.
What I'm wondering about now is why in the VC Cookbook, under scenario 1:5, it specifically says that "Smartlink should NOT be enabled". I understand that in a "horizontal" failover with active/standby ports, Smartlink wouldn't be needed, but is there a problem with having it enabled?
Since having smartlink enabled seems to solve our issue, and provide us the most redundancy, I'd like to leave it enabled, but I don't want to cause any other issues...
Mike O.
By the way, the reason we had been using "beacon" instead of "link status" was from the cookbook; it shows beacon in the ESX configuration section of scenario 1:5. That also seemed logical, since with Smartlink disabled (per the cookbook), it seemed like we would never have a link failure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-19-2012 08:47 AM
01-19-2012 08:47 AM
Re: Network failover not working with ESXi5, hp Virtual Connect
@Psychonaut wrote:
Do you have Smart Link enabled on the SUS's?
We did not, per the VC Cookbook scenario 1:5. However, as part of our testing today we did enable it and the failovers work exactly as we want them to (see my other response). I'm still not sure why the cookbook specifically says "Smartlink should NOT be enabled". I can see that it wouldn't help in a failover with the active/standby ports, but will having it enabled cause any problems?
Mike O.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-19-2012 09:53 AM
01-19-2012 09:53 AM
Re: Network failover not working with ESXi5, hp Virtual Connect
Hi Mike,
Please keep "Smartlink" on, it won't do any harm. I'll say most of VC deployment should have smartlink enabled to make sure we are not blackholing the traffic.
One instance that you don't want to use "smartlink" is only when you have some internal communications across blades and you still want to have server NICs up even when all uplinks go down. Some scenarios like cluster configuration that you don't want to trigger host failover. But in your topology you should enable smartlink.
Also, Try to set your vswith failover to "link status'. this may give you quicker failover time because you don't have to wait multiple times of beacon heartbeat missing before triggering failover. "link status" is default and that should just work fine.
I believe the reason VC cookbook uses "beacon" is because that back in some early time, Smartlink feature doesn't work consistently on some NIC firmware versions. Nowadays with latest firmware/driver, smartlink will work well and then you should leave NIC side as "link status" failover.
Take a look at VC Flexfabric cookbook, which is latest VC module. You can see in Scenario 5, the "link status" is being used by vswitch. Forget about FCOE and FlexNIC part which doesn't apply to VC 1/10 module. Basic ethernet design and failover is the same.
http://h20000.www2.hp.com/bc/docs/support/SupportManual/c02616817/c02616817.pdf
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-19-2012 11:08 AM
01-19-2012 11:08 AM
Re: Network failover not working with ESXi5, hp Virtual Connect
Thanks, that's what I was hoping to hear, that Smartlink wouldn't cause any problems. I guess what was concerning me was they way the cookbook worded it, that "Smartlink should NOT be enabled", with "NOT" in all caps. I didn't see how it would hurt, but they fact that they emphasized "NOT" made me wonder..
For the VMWare detection, once we re-enabled Smartlink we were going to go ahead an use the "link status" in VMWare instead of beaconing.
Besides the issue with VC looping back the beacon packets, I can understand where beacon probing theoretically could help detecting upstream switch failures, but in our case our blade chassis is connected directly to our top level "core" switchs in our data center; there's no other "upstream" switch for the beaconing to detect. If our core 6509 switch isn't talking to anything else, we have a whole lot more issues going on...
I have a copy of the Flexfabric cookbook, but I didn't really dig into much since we're not using the Flex-10 modules at this time.
Thanks again.
Mike O.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-19-2012 11:16 AM
01-19-2012 11:16 AM