Operating System - Linux
1819866 Members
2855 Online
109607 Solutions
New Discussion юеВ

Auto negotiation on 1000Mbps failure.

 
Morgan Johansson
New Member

Auto negotiation on 1000Mbps failure.

I have a weird problem with Linux auto negotiating the link-speed to 1000 Mbps.

This is my setup:

Cisco 4948
HP DL585G5 (with 2 quad network cards)
RHEL5 (2.6.18-92.1.18.el5)
network bonding (bonding ports between the two quad NICs)

The problem is that when we set the cisco switches to full auto and then restart the switch, Linux will first detect the ports as 1000 Mbps and bring them up and down a couple of times, then finally it decides to bring them up in 100 Mbps only.

Setting the Cisco "hard" to 1000 Mbps is a workaround (see ethtool output below), but I have heard that it is not wise to have Gigabit networking configured like this and that it should always be set to auto (anyone know why?)

ethtool when cisco set to not autonegotiate:
_________output________
ethtool eth6
Settings for eth6:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pumbag
Wake-on: d
Current message level: 0x00000001 (1)
Link detected: yes
_________end of_output________


Cisco in auto-mode:
This is from the messages log (I've cut out so we only see one eth now, but it is the same for all the rest):

_________output________
Feb 11 09:54:10 burs1p-te02 kernel: 0000:4a:02.0: eth6: Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 11 09:54:10 burs1p-te02 kernel: bonding: bond0: link status definitely up for interface eth6.
Feb 11 09:54:19 burs1p-te02 kernel: 0000:4a:02.0: eth6: Link is Down
Feb 11 09:54:19 burs1p-te02 kernel: bonding: bond0: link status definitely down for interface eth6, disabling it
Feb 11 09:54:29 burs1p-te02 kernel: 0000:4a:02.0: eth6: Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
Feb 11 09:54:29 burs1p-te02 kernel: 0000:4a:02.0: eth6: 10/100 speed: disabling TSO
Feb 11 09:54:29 burs1p-te02 kernel: bonding: bond0: link status definitely up for interface eth6.
_________end_of_output________

Any advice would be appreciated!

/ Morgan
8 REPLIES 8
T G Manikandan
Honored Contributor

Re: Auto negotiation on 1000Mbps failure.

Hi Morgan,

Autonegotiation is a mandatory for 1G.

I would request you to check:

* Changing the cables or moving to a different port
* HP-UX is not capable of having two interfaces up on the same network IP.

Can you provide the output for #lspci
Morgan Johansson
New Member

Re: Auto negotiation on 1000Mbps failure.

Can you explain why autoneg is mandatory please?

We saw this behavior on all (10 machines) at the same time when we rebooted the Cisco switch, so don't think there is a cable or port problem.

HP-UX, same ip on two interfaces... what?
We are using RHEL5 with network bonding.

Here is the output from lspci:

_________output________
00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a4)
00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev b1)
00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2)
00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a4)
00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev a3)
00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2)
00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control
00:19.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control
00:1a.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:1a.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map
00:1a.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller
00:1a.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control
00:1a.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control
00:1b.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:1b.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map
00:1b.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller
00:1b.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control
00:1b.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control
01:03.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
01:04.0 System peripheral: Compaq Computer Corporation Integrated Lights Out Controller (rev 03)
01:04.2 System peripheral: Compaq Computer Corporation Integrated Lights Out Processor (rev 03)
01:04.4 USB Controller: Hewlett-Packard Company Proliant iLO2 virtual USB controller
01:04.6 IPMI SMIC interface: Hewlett-Packard Company Proliant iLO2 virtual UART
02:00.0 RAID bus controller: Hewlett-Packard Company Smart Array Controller (rev 04)
08:00.0 RAID bus controller: Hewlett-Packard Company Smart Array Controller (rev 04)
40:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a4)
40:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev b1)
40:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
40:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
40:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
40:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
40:10.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8132 PCI-X Bridge (rev 12)
40:10.1 PIC: Advanced Micro Devices [AMD] AMD-8132 PCI-X IOAPIC (rev 12)
40:11.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8132 PCI-X Bridge (rev 12)
40:11.1 PIC: Advanced Micro Devices [AMD] AMD-8132 PCI-X IOAPIC (rev 12)
41:01.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02)
41:02.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02)
49:00.0 PCI bridge: Integrated Device Technology, Inc. Unknown device 8018 (rev 0e)
4a:02.0 PCI bridge: Integrated Device Technology, Inc. Unknown device 8018 (rev 0e)
4a:04.0 PCI bridge: Integrated Device Technology, Inc. Unknown device 8018 (rev 0e)
4b:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
4b:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
4c:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
4c:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
50:00.0 PCI bridge: Integrated Device Technology, Inc. Unknown device 8018 (rev 0e)
51:02.0 PCI bridge: Integrated Device Technology, Inc. Unknown device 8018 (rev 0e)
51:04.0 PCI bridge: Integrated Device Technology, Inc. Unknown device 8018 (rev 0e)
52:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
53:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
53:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
_________end_of_output________


BR
Morgan
T G Manikandan
Honored Contributor

Re: Auto negotiation on 1000Mbps failure.

rick jones
Honored Contributor

Re: Auto negotiation on 1000Mbps failure.

The IEEE specs for Gigabit require a compliant Gigabit device to support autoneg. This differs from 100BT where support for autoneg was an add-on/option.

As such, autoneg should "always" "work" with a Gigabit device. If it does not, it implies the gigabit device is in some way deficient.

There is probably some confusion about the way things can be hardcoded these days. Strict hardcoding (disabling negotiation) will lead to situations such as in the attachment. Some devices offer ways to keep autoneg enabled, but limit what they will negotiate. While that isn't as heinous as "hard coding" it is probably still to be avoided as it is simply papering over a symptom rather than finding and fixing root cause.

Root cause could be anything from:

*) bad driver rev for the NIC
*) bad firmware rev for the switch
*) marginal cables
*) bad firmware rev for the NIC
*) other
there is no rest for the wicked yet the virtuous have no pillows
Morgan Johansson
New Member

Re: Auto negotiation on 1000Mbps failure.

Thanks for your reply.
We will try to update the e1000 drivers and test again. I will update and close this thread if it works.

BR
Morgan
Andrew Cowan
Honored Contributor

Re: Auto negotiation on 1000Mbps failure.

Morgan,

I notice you mention "bonding" and my advice would be to drop that just for now and try and bring up just a single interface in Gigabit mode, then when it is stable, repeat the exercise with the other.

The other thing you could check is that you are using CAT-6 cables as CAT5e and below are only certified for up to 100Mbps.
BUPA IS
Respected Contributor

Re: Auto negotiation on 1000Mbps failure.

Hello Morgan ,
Here is my explanation of the why for Auto negotiate.
Auto negotiate is always required for 1000baseT, according to the IEEE 802 standards and will be for all newer speeds like 10 Gigabit .

When a NIC is first enabled a series of link pulses is sent and exchanged between the ends of the link. These carry timing and data bits and are used to determine the quality of the link the round trip delay and capabilities of the link partners.
They are also used to equalise the line signal levels so the transmitters and receivers can cope with the length (and quality) of cable they have been given on this occasion. Once the quality of the link is determined along with the link partner the speed is set for the session. Also a Master slave relationship is set up for ongoing link management and recovery. If the auto neogtiate is unable to be completed (beacuse auto is set off at one end ) the master/slave relationship and equalisation settings often do not get set properly if at all, leading to high error rates random speed shifts late collisons on switched links and in general poor performance.

Now for your problem, I would agree with Andrew try without bonding to see if a single link will remain stable. The speed fall back seems to imply that there is a poor quality link but I would work through all of suggestions listed by Rick as well .

Mike
Help is out there always!!!!!
Morgan Johansson
New Member

Re: Auto negotiation on 1000Mbps failure.

BUPA IS, thanks for your reply.

At the moment we have these settings:

on the RHEL5 machine:
/etc/sysconfig/network-scripts/ifcfg-eth4
DEVICE=eth4
BOOTPROTO=none
ONBOOT=yes
MASTER=bond2
SLAVE=yes
USERCTL=no
ETHTOOL_OPTS="autoneg on speed 1000 duplex full"

/etc/sysconfig/network-scripts/ifcfg-eth8
DEVICE=eth8
BOOTPROTO=none
ONBOOT=yes
MASTER=bond2
SLAVE=yes
USERCTL=no
ETHTOOL_OPTS="autoneg on speed 1000 duplex full"

/etc/sysconfig/network-scripts/ifcfg-bond2
DEVICE=bond2
BONDING_OPTS="mode=1 miimon=100"
BOOTPROTO=none
ONBOOT=yes
NETMASK=255.255.255.0
IPADDR=10.1.101.130
USERCTL=no

And the Cisco 4948 is set to full auto for these two ports.

When we restart the switch and bring it back up again, the log looks like this (only showing eth8 here):

Feb 18 11:35:11 burs2p-te02 kernel: 0000:4c:00.0: eth8: Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 18 11:35:11 burs2p-te02 kernel: bonding: bond2: link status definitely up for interface eth8.
Feb 18 11:35:31 burs2p-te02 kernel: 0000:4c:00.0: eth8: Link is Down
Feb 18 11:35:31 burs2p-te02 kernel: bonding: bond2: link status definitely down for interface eth8, disabling it
Feb 18 11:35:48 burs2p-te02 kernel: 0000:4c:00.0: eth8: Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 18 11:35:48 burs2p-te02 kernel: bonding: bond2: link status definitely up for interface eth8.
Feb 18 11:35:56 burs2p-te02 kernel: 0000:4c:00.0: eth8: Link is Down
Feb 18 11:35:56 burs2p-te02 kernel: bonding: bond2: link status definitely down for interface eth8, disabling it
Feb 18 11:36:12 burs2p-te02 kernel: 0000:4c:00.0: eth8: Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 18 11:36:12 burs2p-te02 kernel: bonding: bond2: link status definitely up for interface eth8.

As you can see it brings the link up and down 2 times before it finally decides it is in up state. With these settings we reach 1000Mbps which is good, but it takes over 1 minute before the link is up again which is not ideal.

Our next test would be as you suggest, to disable bonding and see if the eth's behave the same way on their own.

Are there any other settings (in the switch or on the machine) that could speed up the autonegotioating and keep it from bouncing up and down?

Thanks
Morgan