Aruba & ProVision-based
1753498 Members
4688 Online
108794 Solutions
New Discussion

HP Openflow incompatibility with LACP trunks with DOT1Q when untagged VLAN exist

 
LucianoCuchi
New Member

HP Openflow incompatibility with LACP trunks with DOT1Q when untagged VLAN exist

Hi everybody,

I have an unexpected behaviour in a switch HP2920-24G(J9726A) when Openflow is activated and there is an uplink aggregated with LACP and DOT1Q. It causes the LACP link go DOWN.

Scenario
The unexpected behaviour has been replicated in a HP2920-24G(J9726A) with WB.16.05.0011 and with WB.16.04.0011 (I think it is also happening in a switch HP2930 (JL259A) with WC.16.05.0007 connected to a Dell with Cumulus.,although I don't have the debug information for this.)

The scenario is:
    - HP2920 connected to Cisco 2960 (12.2(55r)SE) with 2 uplinks aggregated in a LACP trunk with DOT1Q.
    - The Cisco has a native VLAN without tag and the HP2920 has configured the same VLAN as untagged in the trunk.
    - The tests has been done using VLAN1 and VLAN3000 as "native" VLAN. It doesn't matter the VLAN ID used.
    - The HP2920 will connect to a Openflow Controller using the VLAN40, which is allowed through the uplink trunk.
    - Spanning Tree is disabled in the HP.

Everything works fine until Openflow is activated including the native VLAN of the DOT1Q trunk.

We have replicated the issue either configuring it in "Aggregation mode" or in "Virtualization mode" including the native VLAN as member of an the instance. We prefer "aggregation mode" so the I will continue with this scenario.

So, when enabling Openflow in "Aggregation mode" the LACP links in the HP starts flapping, being blocked and unblocked continuosly:

 09/28/18 09:31:07 03852 OPENFLOW: ST1-CMDR: Instance aggregate has been
            enabled in Active mode with Flow Location Hardware and Software.
I 09/28/18 09:31:07 03850 OPENFLOW: ST1-CMDR: OpenFlow is globally enabled.
I 09/28/18 09:31:07 03869 OPENFLOW: ST1-CMDR: Instance aggregate: Exiting fail
            secure mode.
I 09/28/18 09:33:18 00435 ports: ST1-CMDR: port 1/22 is Blocked by LACP
I 09/28/18 09:33:18 00076 ports: ST1-CMDR: port 1/22 in Trk1 is now on-line
I 09/28/18 09:33:24 00435 ports: ST1-CMDR: port 1/21 is Blocked by LACP
I 09/28/18 09:33:24 00076 ports: ST1-CMDR: port 1/21 in Trk1 is now on-line
I 09/28/18 09:34:51 00435 ports: ST1-CMDR: port 1/22 is Blocked by LACP
I 09/28/18 09:34:51 00076 ports: ST1-CMDR: port 1/22 in Trk1 is now on-line
I 09/28/18 09:34:57 00435 ports: ST1-CMDR: port 1/21 is Blocked by LACP
I 09/28/18 09:34:57 00076 ports: ST1-CMDR: port 1/21 in Trk1 is now on-line
I 09/28/18 09:36:24 00435 ports: ST1-CMDR: port 1/22 is Blocked by LACP
I 09/28/18 09:36:24 00076 ports: ST1-CMDR: port 1/22 in Trk1 is now on-line
I 09/28/18 09:36:30 00435 ports: ST1-CMDR: port 1/21 is Blocked by LACP
I 09/28/18 09:36:30 00076 ports: ST1-CMDR: port 1/21 in Trk1 is now on-line
I 09/28/18 09:37:57 00435 ports: ST1-CMDR: port 1/22 is Blocked by LACP
I 09/28/18 09:37:57 00076 ports: ST1-CMDR: port 1/22 in Trk1 is now on-line

The Cisco doens't show any issue with the LACP. What we have seen is that most of the LACP PDUs that the Cisco send to the HP are lost. Why this losses? I don't know, but I have checked that the Cisco sends and receives the LACP BPDUs correctly.

HP-Stack-2920# sh lacp counters
LACP Port Counters.
              LACP      LACP      Marker   Marker   Marker   Marker           
  Port Trunk  PDUs Tx   PDUs Rx   Req. Tx  Req. Rx  Resp. Tx Resp. Rx Error   
  ---- ------ --------- --------- -------- -------- -------- -------- --------
  1/21 Trk1   9         1         0        0        0        0        0       
  1/22 Trk1   9         1         0        0        0        0        0       
 
CISCO_2960#sh lacp counters
             LACPDUs         Marker      Marker Response    LACPDUs
Port       Sent   Recv     Sent   Recv     Sent   Recv      Pkts Err
---------------------------------------------------------------------
Channel group: 1
Gi1/0/47    9      9        0      0        0      0         0     
Gi1/0/48    9      9        0      0        0      0         0    

Because of this block/unblock process finally both links are blocked by LACP.

In that moment the Cisco brings down the Port-Channel interface, so the HP lost connectiviy.

It's funny because the HP lost the connectivity but still shows the Interfaces as UP and shows the LACP as UP and OK, altough it doesn't have the peer information.

Conclusion:
Openflow affects the LACP process if there is DOT1Q with any VLAN untagged and managed by Openflow in a LACP trunk, dropping most of the PDUs. This causes that finally the trunk goes down.

Workaround1:
Configure Openflow in "virtualization mode" and don't include the untagged vlan of the trunk.

Workaround2:
Configure ALL the Vlans in the trunk as TAGGED. This avoid any issue with the Openflow Process.

Bonus Info:
Collecting info for this case I have realized that exactly the same happens with STP and LLDP.
So it is checked that Openflow affects all this kind of L2 protocols, dropping many of the PDUs (but not all, which is strange) if there is an untagged VLAN in the trunk.

Questions:
Does anybody know the cause of this behaviour?
Why does the switch drop some of the PDUs but not all?
Openflow shouldn't affect these protocols regardless the DOT1Q configuration, should it?

Thank you very much for your patience after reading all this information, please don't doubt to ask for more details.

Regards

Luciano

1 REPLY 1
ShaunWackerly
HPE Pro

Re: HP Openflow incompatibility with LACP trunks with DOT1Q when untagged VLAN exist

Hi Luciano,

Would you be able to provide the following information for us? You've already provided a great deal, but this would help us in narrowing in on why you see this behavior:

  • "show openflow instance X flows" output 
  • Which controller you are using when this happens
  • Complete switch configuration

Also, could you confirm/deny that the uplink port is carrying both control-plane and data-plane traffic?

Shaun

I am an HPE Employee