HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
BladeSystem - General
cancel
Showing results for 
Search instead for 
Did you mean: 

Virtual Connect LACP issue

 
ringwyrm
Occasional Visitor

Virtual Connect LACP issue

All:

Connecting Juniper MX switch to two different chassis. Each chassis has two-port LACP bundle. LACP is up/active on switch and in Virtual Connect (both links are active and have the same LAG ID). Aggregated-ethernet interfaces are L2 and in the same VLAN on switch.

Both servers can ping the default gateway. Issue is they can not ping each other. We put in-line tap on all four ethernet connections to see what is happening.

What we see is this, ARP request goes out from server1 on link1 to server2, arp reply comes back but switch sends it down link2 on ae bundle to server1. ARP reply never makes to server. (ARP reply is getting dropped/rejected/mangled between VC module and server).

Its as if the ports are in failover mode, but clearly the status on VC manager is not showing failover mode... also debugs for LACP on switch show LACP TLV status from chassis is that both links in bundle are active/aggregated...

any thoughts?
7 REPLIES
ringwyrm
Occasional Visitor

Re: Virtual Connect LACP issue

I am able to reproduce this failure on any switch vendor platform by changing the load-balancing scheme to anything other than source-mac. The HP chassis is only accepting frames on the "primary" link the in the bundle.
HEM_2
Honored Contributor

Re: Virtual Connect LACP issue

LACP on Virtual Connect works. You're saying that you have the same problem with VC on any vendor switch. It is quite possible that you either have a hardware issue (bad cable, bad port, etc) or are running in to some type of bug.

I would recommend trying to only have one of your LACP links active per chassis and test connectivity, then disconnect those links and try the next LACP link to systematically test each layer 2 path independently.

I have seen a bad cable or port still form an LACP channel but cause issues like you have described.

Also, are you using Virtual MACs? If so, make sure that each VC domain is utilizing a unique range of MAC Addresses.
ringwyrm
Occasional Visitor

Re: Virtual Connect LACP issue

LACP does not work. That is the issue.

Virtual Connect shows both links in the bundle as Linked/Active. We have a sniffer trace in place showing we are indeed sending packets down the second link in the bundle to the server.

The interface stats for the port on that eNet module is showing ifindiscards incrementing continuously as we run a continuous ping. When we stop the ping, the ifindiscards stop.

Per HP's documentation:

######
The number of inbound packets that were chosen to be discarded even
though no errors had been detected to prevent their being delivered to
a higher-layer protocol. One possible reason for discarding such a
packet could be to free up buffer space.
#######

Since this is a lab environment where the only traffic we are testing with is ping (and ARP), we are not running out of buffer space. What other reasons could exist for these discards since its not a frameCRC error?

Anyone?
The Brit
Honored Contributor

Re: Virtual Connect LACP issue

Just to be clear,

1. You have two enclosures ? (type?)
2. Each enclosure has 1(?) VC Module?
3. Each VC Module has 1(?)x2-port uplink set?
4. Both Uplink Sets connect to same Switch.
5. Both Servers (1 in each enclosure) have assigned profiles which designate the appropriate (same) VLAN on the appropriate NIC.
6. Both Uplink Sets are passing the required VLAN out to the switch?

Questions:

1. Can the blade servers ping/connect to other systems on the network (i.e. outside of VC)?

2. Can external servers ping the NIC's on the Servers in the two enclosures?

I am not familiar with the "LAG ID", what do you mean by that.

finally, I am a little confused by your description of what you see. You say that Server1 sends the ARP request to Server2, then "ARP reply comes back", where from? (The implication is that it is coming back from Server2). You then say that "the switch" sends it back to Server1. However you then say that "ARP reply never makes it to the server" Are you saying that the initial ARP request never makes it to Server2, or that the ARP reply never gets back to Server1?

The description is also confused by the references to Link1 and Link2. If you are refering to the individual ports on UpLink1 (from enclosure 1) then the statement doesn't really mean anything since the (outgoing) request, and the (incoming) reply could traverse either port.
However if you are using Link1 and Link2 to refer to the uplinks in each enclosure then the interpretation would be much different.

Finally again, could you include the VCM firmware level in your response

thanks

Dave.
ringwyrm
Occasional Visitor

Re: Virtual Connect LACP issue

Okay, Let me clarify...

There are indeed two chassis. Each chassis has one blade server in it and one enet module. Each enet module is attached to the switch with two uplink ports. So on server 1, we'll call those links S1L1 and S1L2. On Server two we call those thinks S2L1 and S2L2.

There is no vlan tagging taking place. This is all in the native vlan.

So, for the sake of discussion, "Server1" sends out an ARP request which we see comes into the switch on S1L1. The switch forwards this request to "Server2" on S2L1. The ARP response comes back on S2L2 and the switch forwards the response two "Server1" down S1L2. The eNet module discards it.

The answer to your question is that sometimes traffic from other networks can ping the servers, sometimes it can't. It depends if the switch forwards the traffic down S1L2 or not. In the course of load-balancing the traffic, basically, the switch may randomly choose to forward the traffic down either of the links, and if it chooses S1L2 then of course communication fails because the enet module discards it. It doesn't matter what the source of the traffic is (the other server or devices on another network).




HEM_2
Honored Contributor

Re: Virtual Connect LACP issue

if you temporarily shut down S1L1 and test connectivity, what happens?

if you bring S1L1 back up and temporarily shut down S1L2 and test connectivity what happens?

It does sound like one of the links is acting like it's in Standby and discarding all incoming traffic. Maybe LACP communication between VC and Juniper isn't quite working.

I'm not that familiar with Juniper switches. Do you know the LACP Rate they are running at? Is it Fast or Slow or can you configure it? If configurable, try setting the LACP Rate to fast.
ringwyrm
Occasional Visitor

Re: Virtual Connect LACP issue

1. Traffic continues to get discarded inbound on the enet module on S1L2. I don't remember if we see the ARP requests coming from "server1" failing over though...

2. It works as expected. ARP completes and the pings are successful.

3. It is configurable. It is configured for fast and passive at this time. We have tried all combinations of settings, including trying to force the link into standby mode on the Juniper side but this is not the solution we want. We want to load-balance on the two links. That being said, we can reproduce this error on a Cisco switch by changing the load-balancing scheme and getting the switch to send packets down S1L2.

We have a case open with HP now.