- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- default route vs resolv.conf vs etherchannel bond
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2007 07:15 AM
06-22-2007 07:15 AM
default route vs resolv.conf vs etherchannel bond
I have noticed a behavior with the kernel routing tables that I cannot comprehend. This is happening with multiple hosts.
'puters are DL380 G5
Switches are Cisco
O/S is RH4 ES Update 5
NICs are the built-in BCM5708
Bonding module mode is '4' (IEEE LACP)
On the etherchannel side everything looks fine, there are no errors or warnings in dmesg or /var/log/messages when the bond goes up or down.
Weirdness happens when bond0 is initializing. The interface comes up fine but the default route take 20-30 seconds to register with the kernel.
Issuing the "route" command to display the current routing tables just after bringing bond0 up outputs the network route and the bogus 169.254 that is built into ifup and than hangs there for ~20 seconds.......and exits when the default route comes up.
Issuing "route -n" displays all routes instantly, calling the default route 0.0.0.0 instead of "default". That got me thinking about name resolution.
If bond0 is down and I comment the DNS servers out of /etc/resolv.conf and then ifup bond0 the default route comes up instantly. If I then return to /etc/resolv.conf and uncomment the DNS servers "route" output hangs for 20 seconds again.
I do have to mention that the default route does its job when it comes up afer its 20 second lapse...
Weirdness happens with dhcp or static IP configs.
If I skip the bonding module and ifup eth0 on one of te lacp'd ports the behavior is the same. Using only eth0 in a non-lacp'd port sees the route come up instaltly (as it should) regardless of the /etc/resolv.conf state.
Issuing route add default gw 172.25.71.0 is the same as having it defined in ifcfg-bond0 or /etc/sysconfig/network. You get the 20 second hang.
Anyone know what the deuce is going on with etherchannel bonds vs the default route vs name resolution ? Driving me nuts :D
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2007 07:49 AM
06-22-2007 07:49 AM
Re: default route vs resolv.conf vs etherchannel bond
20- to 30-second delays in these cases is a
reverse DNS failure, that is, address to name
look-up. If you do some exploration with
nslookup (or, I suppose, "dig"), you should
be able to put in any of the IP addresses
being used for these interfaces, and get a
prompt (and accurate) response (the
corresponding name). If not, then the
problem could be a simple as missing data in
the DNS server's data base, or something more
complicated. But it sure sounds like
something in the DNS. (And with an address
in 172.16.0.0/12, the culprit must be
somewhere in your organization, right?)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2007 08:10 AM
06-22-2007 08:10 AM
Re: default route vs resolv.conf vs etherchannel bond
Thanks for your reply. All reverse lookups are successful and instantaneous once the default route is up. On a "unbonded" switch port everything works right away.
Anyways the DNS servers cannot be reached unless the default gateway is operational since they are not on the same subnet.
Good try though, Reverse lookups have caused their fair share of hiccups in the past.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2007 08:25 AM
06-22-2007 08:25 AM
Re: default route vs resolv.conf vs etherchannel bond
> instantaneous once the default route is up.
Then the answer may be to add the things to
the local /etc/hosts, so all will be known
before the DNS server can be reached. And
make sure in /etc/nsswitch.conf (or whatever
you have like it) that you're looking at the
"files" first (or at least before the "dns").
Remember, /etc/hosts is there to provide the
info you need before you can get it from the
_right_ place.
> Reverse lookups have caused their fair
> share of hiccups in the past.
I'm still betting on them here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2007 08:45 AM
06-22-2007 08:45 AM
Re: default route vs resolv.conf vs etherchannel bond
I have made progress and I think we can stop pointing fingers at name resolution. Get this : Upon bringing up bond0 I have immediately started a ping on the gateway and the output reports "host unreachable" for 30 seconds flat and then proceeds to ping successfuly.
No wonder the default gate took 30 seconds to come up. This looks suspiciouly like the LACP refresh rate of the bonding module. It supports 30 and 1 second intervals. I have tried setting it to 1 second but this did not change the outcome. Maybe the cisco switch cannot handle the 1 second refresh rate.
Question to all Linux admins using the bonding module with cisco switches under LACP mode : ever noticed a 30 second delay after link-up for TCP/IP stuff to come up ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2007 11:23 AM
06-22-2007 11:23 AM
Re: default route vs resolv.conf vs etherchannel bond
---
The 802.3ad mode requires that the switch have the appropriate ports configured as an 802.3ad aggregation. The precise method used to configure this varies from switch to switch, but, for example, a Cisco 3550 series switch requires that the appropriate ports first be grouped together in a single etherchannel instance, then that etherchannel is set to mode "lacp" to enable 802.3ad (instead of standard EtherChannel).
---
I take it you've done this?
(Note: I haven't tried this configuration myself)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2007 03:27 PM
06-23-2007 03:27 PM
Re: default route vs resolv.conf vs etherchannel bond
options timeout:1
options attempts:1
Also try chanhging /etc/hosts file loopback entries
from
#127.0.0.1 localhost.localdomain localhost
to
127.0.0.1 localhost loopback
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-25-2007 02:26 AM
06-25-2007 02:26 AM
Re: default route vs resolv.conf vs etherchannel bond
Stuart : yes the switch and bonding driver have been setup according to the bonding.txt instructions. I have also tried most of the switch settings, just for kicks.
Santhosh : I had already tried the timeout and retry options in resolv.conf..I'll try playing with the hosts file but since I cannot ping the IP I'm trying to route to (for 30 seconds) I'm still thinking this is a bonding issue.
Its a long week-end holiday here, getting back to this tomorrow.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-25-2007 04:02 AM
06-25-2007 04:02 AM
Re: default route vs resolv.conf vs etherchannel bond
cat /proc/net/bonding/bond0
I had the problem that the module options when loading the bonding module were not passed correctly to the interface.
Second suggestion: Set your nameserver to 127.0.0.1 and start tcpdump on your localhost interface. You'll see then if the activation of your bonding interface asks for any names.
What does your cisco logs say at the activation time? Maybe there are any settings blocking for these 20 seconds.
Sorry, this is no solution for your problem. It could be a kernel problem, so check the bond0-file to see when exactly the bonding interface is activated and watch your cisco logs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-25-2007 05:14 AM
06-25-2007 05:14 AM
Re: default route vs resolv.conf vs etherchannel bond
Delays in the switch being willing to update its forwarding tables?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-11-2008 11:24 AM
07-11-2008 11:24 AM
Re: default route vs resolv.conf vs etherchannel bond
The culprit has been identified after some crafty protocol sniffing by a fellow admin : it is called STP.
The Spanning Tree Protocol is used by switches to prevent nastiness like logical loops from being established on the network.
When implementing port channels, at least the LACP kind, one must disable the STP on the individual channel ports and enable it on the channel itself.
Everything now registers within 5 seconds and all is well.
Credit goes to fellow admin Rémy Couture.
I'm leaving the thread open until next week to maximize visibility and catch any comments, will close afterwards.
Thanks for reading !