Simpler Navigation for Servers and Operating Systems - Please Update Your Bookmarks
Completed: a much simpler Servers and Operating Systems section of the Community. We combined many of the older boards, so you won't have to click through so many levels to get at the information you need. Check the consolidated boards here as many sub-forums are now single boards.
If you have bookmarked forums or discussion boards in Servers and Operating Systems, we suggest you check and update them as needed.
cancel
Showing results for 
Search instead for 
Did you mean: 

Bonding Failover Problem

Tim Goodman
Occasional Advisor

Bonding Failover Problem

I have DL380 G4s with NC7771 NIC cards running redhat ES 3 update 6. I have them in mode 1 plugged in to separate switches. I have tried the tg3 and ncm5700 nic drivers. The problem is that when I unplug the active nic it won't pass traffic to the other nic for about 60-90 seconds. When you view the dmesg output it shows it failing immediately and activating the other nic. Any ideas why this is not working correctly?

modules.conf -
#alias eth0 tg3
alias eth0 bcm5700
#alias eth1 tg3
alias eth1 bcm5700
#alias eth2 bcm5700
alias scsi_hostadapter cciss
alias usb-controller usb-uhci
alias usb-controller1 ehci-hcd
alias bond0 bonding
options bond0 mode=1 miimon=100

ifcfg-bond0 -
DEVICE=bond0
BOOTPROTO=none
IPADDR=10.70.80.119
NETMASK=255.255.240.0
ONBOOT=yes
TYPE=Ethernet
USERCTL=no

ifcfg-eth0 -
# eth0
DEVICE=eth0
#IPADDR=10.70.80.119
#NETMASK=255.255.240.0
BOOTPROTO=none
USERCTL=no
MASTER=bond0
SLAVE=yes
ONBOOT=yes
#ETHTOOL_OPTS="speed 100 duplex full autoneg off"
TYPE=Ethernet

ifcfg-eth1 -
DEVICE=eth1
BOOTPROTO=none
USERCTL=no
MASTER=bond0
SLAVE=yes
ONBOOT=yes
#ETHTOOL_OPTS="speed 100 duplex full autoneg off"
TYPE=Ethernet

dmesg output
bcm5700: eth0 NIC Link is Down
bond0: link status definitely down for interface eth0, disabling it and making interface eth1 the active one.
4 REPLIES
Ivan Ferreira
Honored Contributor

Re: Bonding Failover Problem

Maybe a switch (spanning tree or something) issue. It looks that it takes too long to identify the location of the MAC address.

Check the status in proc/net/bonding/bond0.

Check also the port status for autonegotiation problems.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Tim Goodman
Occasional Advisor

Re: Bonding Failover Problem

It's not a switch problem because I have some DL380 G5s and DL360 G3s that fail over correctly. The servers are negotiating correctly.
Ivan Ferreira
Honored Contributor

Re: Bonding Failover Problem

Check your kernel configuration for ARP values, for example, arp_filter.

The, I would try with fail_over_mac option.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Highlighted
Tim Goodman
Occasional Advisor

Re: Bonding Failover Problem

arp_filter is 0

I can't do fail_over_mac because it was added in v 3.2 and I'm running 2.6