Aruba & ProVision-based
cancel
Showing results for 
Search instead for 
Did you mean: 

[SOLVED] LACP between 2520G and DL380E Gen8

 
SOLVED
Go to solution
andrea_italy
Occasional Contributor

[SOLVED] LACP between 2520G and DL380E Gen8

Hi all,

I'm doing some tests to connect a 2520G-8 switch and a DL380E Gen8 server (Ubuntu server 18.04) in LACP.

 

The switch configuration is as follows (I wanted to create a static LACP) :

 

Running configuration:

; J9298A Configuration Editor; Created on release #J.15.09.0028
; Ver #06:04.08.00.01.14.05:1a
hostname "swZfs"
trunk 2,4 trk2 lacp
power-over-ethernet pre-std-detect
qos dscp-map 000000 priority 0
qos dscp-map 001000 priority 1
qos dscp-map 010000 priority 2
qos dscp-map 011000 priority 3
qos dscp-map 100000 priority 4
qos dscp-map 101000 priority 5
qos dscp-map 110000 priority 6
qos dscp-map 111000 priority 7
timesync sntp
sntp unicast
sntp server priority 1 193.204.114.232
sntp server priority 2 193.204.114.105
sntp server priority 3 217.147.223.78
no telnet-server
time daylight-time-rule western-europe
time timezone 60
no web-management
ip default-gateway 192.168.99.1
ip dns server-address priority 1 192.168.99.1
snmp-server community "public" unrestricted
vlan 1
   name "DEFAULT_VLAN"
   no untagged 5,7,9-10,Trk2
   untagged 1,3,6,8
   no ip address
   exit
vlan 10
   name "Lan_Default"
   untagged 5,Trk2
   tagged 9-10
   no ip address
   exit
vlan 99
   name "Management"
   untagged 7
   tagged 9-10
   ip address 192.168.99.7 255.255.255.0
   exit
spanning-tree Trk2 priority 4
no dhcp config-file-update
password manager

On server side, the bond between interfaces is defined as follows:

 

root@srvhp2:~# ip -c a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 92:08:69:xxxxxxx brd ff:ff:ff:ff:ff:ff
3: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 92:08:69:xxxxxxx brd ff:ff:ff:ff:ff:ff
4: eno3: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN group default qlen 1000
    link/ether 92:08:69:xxxxxxx brd ff:ff:ff:ff:ff:ff
5: eno4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 92:08:69:xxxxxxxbrd ff:ff:ff:ff:ff:ff
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 92:08:69:xxxxx brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.11/24 brd 192.168.10.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet6 fe80::9008:69ff:xxxxxx/64 scope link 
       valid_lft forever preferred_lft forever

and the bond seems to be correct:

 

root@srvhp2:~# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 92:08:69:…….
Active Aggregator Info:
	Aggregator ID: 3
	Number of ports: 2
	Actor Key: 9
	Partner Key: 27
	Partner Mac Address: c8:cb:b8:……..

Slave Interface: eno4
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 2c:59:e5:…….
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 92:08:69:…...
    port key: 9
    port priority: 255
    port number: 1
    port state: 71
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1

Slave Interface: eno3
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: 2c:59:e5:…….
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 92:08:69:…..
    port key: 0
    port priority: 255
    port number: 2
    port state: 71
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1

Slave Interface: eno2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 2c:59:e5:…...
Slave queue ID: 0
Aggregator ID: 3
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 92:08:69:…….
    port key: 9
    port priority: 255
    port number: 3
    port state: 63
details partner lacp pdu:
    system priority: 60800
    system mac address: c8:cb:b8:……..
    oper key: 27
    port priority: 0
    port number: 4
    port state: 61

Slave Interface: eno1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 2c:59:e5:…….
Slave queue ID: 0
Aggregator ID: 3
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 92:08:69:……..
    port key: 9
    port priority: 255
    port number: 4
    port state: 63
details partner lacp pdu:
    system priority: 60800
    system mac address: c8:cb:b8:…..
    oper key: 27
    port priority: 0
    port number: 2
    port state: 61

 

so far so good and the server connection works in the sense that I can connect from a pc via ssh.

 

At server startup dmesg return:

 

 

root@srvhp2:~# dmesg | tail -10
[   20.506706] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
[   20.625604] bond0: Enslaving eno4 as a backup interface with an up link
[   20.739875] bond0: Enslaving eno3 as a backup interface with an up link
[   20.853942] bond0: Enslaving eno2 as a backup interface with an up link
[   20.968133] bond0: Enslaving eno1 as a backup interface with an up link
[   23.625888] igb 0000:02:00.3 eno4: igb: eno4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   23.728865] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[   24.056784] new mount options do not match the existing superblock, will be ignored
[   24.260464] igb 0000:02:00.1 eno2: igb: eno2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   24.612476] igb 0000:02:00.0 eno1: igb: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX

 

and on the switch the LACP seems to work:

 

swZfs(config)# sh lacp

                                   LACP

          LACP      Trunk     Port                LACP      Admin   Oper
   Port   Enabled   Group     Status    Partner   Status    Key     Key
   ----   -------   -------   -------   -------   -------   ------  ------
   2      Active    Trk2      Up        Yes       Success   0        27    
   4      Active    Trk2      Up        Yes       Success   0        27    

Now I describe the part that does not work.

 

If I remove one of the two trunk cables the connection still works: the ssh session between pc and server remains active. With dmesg I see:

 

[  865.685030] igb 0000:02:00.0 eno1: igb: eno1 NIC Link is Down
[  865.685594] igb 0000:02:00.0 eno1: speed changed to 0 for port eno1

and if I reattach it:

[  933.075833] igb 0000:02:00.0 eno1: igb: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX

But if I disconnects the second cable the connection fails and there is no way to restore it, not even waiting a long time.

Why? Should the LACP not notice immediately when a link is restored? (connected cable)

 

Obviously the LACP in this way does not work .... but I can not understand what I did wrong.

Can you help me?

Thanks in advance.

 

Hello,

Andrea

p.s. sorry for the length

 

5 REPLIES 5
parnassus
Honored Contributor

Re: LACP between 2520G and DL380E Gen8

Hello Andrea,

there is something strange in your switch-server connection, exactly on bond0 status (so Server side, bond0 setup): if, Switch side, only two interfaces - respectively interface 2 and interface 4 (trunk 2,4 trk2 lacp) -  are aggregated members of interface trk2 (note: logical interface trk2 is untagged member of VLAN 10) and this one is correctly configured with IEEE 801.3ad LACP...why your Linux Server's bond0 shows four (enslaved) member ports, respectively ports named eno1, 2, 3 and 4?

If I'm not mistaken...bond0 should have, at best, just two member ports...that's enough, not four as per dmesg and cat outputs.

Reconfigure your bond0 from scratch with only with necessary ports (eno1 and end2, as example) and re-test.

 

 

andrea_italy
Occasional Contributor

Re: LACP between 2520G and DL380E Gen8

Hello Parnassus,
to start: thanks for the interest.

I thought I could define a two-port Lacp on the switch (against the 4 of the server) because I understood that LACP was able to choose between active connections, possibly even 2 out of 4.

Evidently I had misunderstood .... so I changed to take the test you ask me.

I changed the trunk configuration on the switch because it is easier than changing the bond on the server: now the trunk is defined with 4 ports both on the switch side and on the server side:

 

Running configuration:

; J9298A Configuration Editor; Created on release #J.15.09.0028
; Ver #06:04.08.00.01.14.05:1a
hostname "swZfs"
trunk 2,4,6,8 trk2 lacp
power-over-ethernet pre-std-detect
qos dscp-map 000000 priority 0
qos dscp-map 001000 priority 1
qos dscp-map 010000 priority 2
qos dscp-map 011000 priority 3
qos dscp-map 100000 priority 4
qos dscp-map 101000 priority 5
qos dscp-map 110000 priority 6
qos dscp-map 111000 priority 7
timesync sntp
sntp unicast
sntp server priority 1 193.204.114.232
sntp server priority 2 193.204.114.105
sntp server priority 3 217.147.223.78
no telnet-server
time daylight-time-rule western-europe
time timezone 60
no web-management
ip default-gateway 192.168.99.1
ip dns server-address priority 1 192.168.99.1
snmp-server community "public" unrestricted
vlan 1
   name "DEFAULT_VLAN"
   no untagged 5,7,9-10,Trk2
   untagged 1,3
   no ip address
   exit
vlan 10
   name "Lan_Default"
   untagged 5,Trk2
   tagged 9-10
   no ip address
   exit
vlan 99
   name "Management"
   untagged 7
   tagged 9-10
   ip address 192.168.99.7 255.255.255.0
   exit
spanning-tree Trk2 priority 4
no dhcp config-file-update
password manager

All 4 ports are connected to the server.

 

I did the test but nothing changed:

- at the start the connection is up

- disconnecting 1 cable: the server continues to be connected

- then I unplug a second cable and everything still works

- remove the third and the connection is interrupted

- reattach all 3 cables but the connection is not restored.

 

In this condition the dmesg command on the server returns :

 

[xxxxxxx] bond0 warning no 802.3ad response from the link partner for any adapters in the bond

 

 

which is repeated every few seconds.   It appears that the switch is not responding, though on the switch side, AFTER having reconnected the cables, LACP seems normal:

 

swZfs(config)# sh lacp

                                   LACP

          LACP      Trunk     Port                LACP      Admin   Oper
   Port   Enabled   Group     Status    Partner   Status    Key     Key
   ----   -------   -------   -------   -------   -------   ------  ------
   2      Active    Trk2      Up        No        Success   0        27    
   4      Active    Trk2      Up        No        Success   0        27    
   6      Active    Trk2      Up        No        Success   0        27    
   8      Active    Trk2      Up        No        Success   0        27    

 

Basically I do not have many ideas ... ..

 

Thanks again for your help,

Andrea

 

 

parnassus
Honored Contributor
Solution

Re: LACP between 2520G and DL380E Gen8

IMHO there still is something strange on bond0 configuration:

Aggregator IDs for all involved bond0's member interfaces should appear all with the same value, as example ID=3  (Instead I read respectively ID=1, 2, 3 and 3 for eno4, eno3, eno2 and eno1)...so, if I'm not mistaken, I should read 3, 3, 3 and 3 (or 1, 1, 1 and 1) respectively for eno4, eno3, eno2 and eno1 interfaces. My take: bond interfaces' Aggregator IDs must be unique within the same bond logical interface --> it explains why eno4 and eno3 - having different Aggregator IDs - behave that way with respect to eno2 and eno1.

Then, another two things you should check are: first try to change the Bond Aggregator Selection Policy (ad_select option) from actual stable value (default) to bandwidth value even if default should be good too...then verify that all 4 Server NIC's ports show correct link status and link speed.

Check also with:

show lacp counters

 If in doubts, reconfiguring the bond0 from scratch should be simple.

Do all involved ports support LACP? What NIC(s) model(s) are you exactly using (all four ports are members of the same NIC)?

What is the /var/log/messages logged sequence related to your bond0 when you perform various links KO/OK test?

Davide.

andrea_italy
Occasional Contributor

Re: LACP between 2520G and DL380E Gen8

Hi Davide,
you're right: Aggregator ID is strange ... ..
I did not want to traffic too much with the Ubuntu configuration because I read that in the transition from 16.04 to 18.04 netplan was implemented instead of the previous ifupdown and this has created several problems.
But the thing you pointed out to me is really suspicious so I will redo the Ubuntu configuration.
Now I can not because it takes a bit 'of time and tomorrow I will be away on business (out of town) for 3 days: I will do it over the weekend bringing here everything.

Thanks for your help.
Hello,

Andrea

andrea_italy
Occasional Contributor

Re: LACP between 2520G and DL380E Gen8

Hi Davide,
I finally solved!
I found an interesting starting point in this thread:

https://www.reddit.com/r/networking/comments/9fez36/does_lacp_between_cisco_and_linux_ubuntu_18041/

in particular in this recalled by the previous one

https://askubuntu.com/questions/1033847/configure-bonded-802-3ad-network-using-netplan-on-ubuntu-18-04

as indicated in this post I tried to change the configuration of the file

/etc/cloud/cloud.cfg.d/50-curtin-networking.cfg

that I had never changed before.

With this change I now actually have a single aggregator ID:

root@srvhp2:~# cat /proc/net/bonding/bond0 | grep Aggre
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
	Aggregator ID: 1
Aggregator ID: 1
Aggregator ID: 1
Aggregator ID: 1
Aggregator ID: 1

However, the right input was just what you gave me on the Aggregator ID that was to be unique, so thanks again for your support!

Hello,
Andrea