Operating System - Linux
1819814 Members
4179 Online
109607 Solutions
New Discussion юеВ

Re: Can't ping the gateway...

 
SOLVED
Go to solution
Qcheck
Super Advisor

Can't ping the gateway...

I can't ping the gateway, however I can ping other servers. This is 5.1 redhar linux node. I had a gateway entry in the /etc/sysconfig/network file, but still can't ping the gateway. This server is really slow because of this.

What could be wrong? I have the same settings on other two nodes and they are fine.

Thanks in advance!
23 REPLIES 23
Matti_Kurkela
Honored Contributor
Solution

Re: Can't ping the gateway...

Adding the gateway entry to /etc/sysconfig/network has no immediate effect by itself: that file is only read when the system is booting, or when you're running ifdown/ifup commands.

If you want to add a default gateway route without bringing down any network interfaces and want it to take effect immediately, you should use the "route" command (or "ip route" if you need some special advanced routing functionality).

Run "route -n" on the problematic node and the other two nodes, and compare the results. If you see differences, read "man route" for the correct syntax for fixing them.

If your routing table is OK, there might be various other possible causes:

* hardware failure
- broken network card
- broken cable
- broken network switch

* firewall/iptables configuration (run "iptables -L -v -n" to check locally: talk with the gateway admin to have the gateway side checked)

- your node does not allow outgoing ping requests and discards them silently (a DROP rule in iptables)

- your node rejects incoming ping replies

- your node's iptables filter does not allow communication with the gateway (all traffic to the gateway DROPped)

- the gateway has a firewall feature that is currently configured to discard any traffic from your node

- if you use VLANs at your site, the switch port where your node is plugged in may be configured to a wrong VLAN

- the gateway is in fact functioning normally, but has been configured to not answer to pings unless they come from "trusted sources"... and your node is not on the list.

MK
MK
Qcheck
Super Advisor

Re: Can't ping the gateway...

MK,

Thank you for the response. route -n shows the gateway is up. But can't ping it. I wonder the same thing something is wrong somewhere else, probably on switch end. Since the servers are at our data center, it is very hard to diagnose. They are away in the city at the clients place and we manage them so far.

So is there any other way from O/S side to determine that particular hardware or switch settings are not correct?

Thank you for your time.
Steven Schweda
Honored Contributor

Re: Can't ping the gateway...

> I can't ping the gateway,

I assume that that really means that you
don't get a "ping" response from that system.

What is "the gateway"? Does it respond to
"ping" requests from other systems?

> however I can ping other servers.

What are the network IP addresses and
netmasks involved here? Routes?

> What could be wrong?

Almost anything? Based on practically no
information ("I can't"), how exact an answer
were you expecting?

> I have the same settings on other two nodes
> and they are fine.

The "same settings" of _what_? _All_ the
network parameters? _Some_ of them? What?


> [...] route -n shows the gateway is up.

It shows me nothing, because my psychic
powers are too weak to show me the results of
your "route -n" command. Perhaps you could
help.

> [...] it is very hard to diagnose.

Imagine how hard it must be for anyone with
no evidence to work with other than your
vague reports.

As usual, showing actual commands with their
actual output can be more helpful than vague
descriptions and interpretations.
ManojK_1
Valued Contributor

Re: Can't ping the gateway...

Hi,

>> I can't ping the gateway,?

Are you able to ping the gateway from other servers which is in the same segment of problematic server and having the same gateway.

>>This server is really slow because of this

How can you say that the server is slow because it is not able to ping the gateway.
Which application is running on this server and what is slow?

In our environment ping to the gateway is disabled due to security reason and i think it is a best practise as per secuirty and audit comncern.

We never faced any performance issues with RHEL 5 because gateway is not able to ping.

Manoj K
Thanks and Regards,
Manoj K
Qcheck
Super Advisor

Re: Can't ping the gateway...

Thank you for the response.

Yes, I am able to ping the gateway from other two nodes. The cluster has 4 nodes and two of the nodes are able to ping the default gateway and the other two can't. Yes, all the configurations(network scripts) are same on all 4 nodes and the same gateway I can't ping from two nodes and the other two can ping the gateway. It was all working until Friday. I can tell the server is slow because I noticed that if I ssh from working node, then the management port(bond2=eth1+eth3) is working normally, means, I get the login prompt very quickly as it is supposed to and whereas the data port(bond1=eth0+eth2) is very slow. But neither one is working from the putty session.

I noticed these two things:
1) Can't ssh or taking forever to get the login prompt from ssh putty session. If I ssh from the working node then it takes so long to get the login prompt.

2) Can't ping the gateway from two nodes.

Oracle ASM cluster is running on all 4 nodes.

Thanks in advance!
Patrick Wallek
Honored Contributor

Re: Can't ping the gateway...

How are you trying to ping the gateway? Are you using the IP address or the hostname? If you are using the hostname, is the /etc/resolv.conf file set up the same way on all servers? What about /etc/nsswitch.conf?
Qcheck
Super Advisor

Re: Can't ping the gateway...

Partick, Thank you for the response. I am trying to ping with the ip address of the gateway and not the hostname.
ManojK_1
Valued Contributor

Re: Can't ping the gateway...

Hi,

From your explanation what i understood is, The node is having a public ip (bond1) and private ip (bond2).bond1 is using for communication to external and bond2 is for cluster internal communication.
Through private segment (bond2 ip) you are able to ssh very fast and throufh public segment (bond1 ip) the ssh login is slow.

Can you please paste the out put of "ip addr" & "netstat -rn" from both the problematic and good server. Also paste /etc/resolv.conf.

From your expalanation I didn't understand "But neither one is working from the putty session" what is it?

When these cluster nodes are rebooted last?

What about the communication (ssh login) in between these cluster nodes through bond 1 & bond2)

Manoj K
Thanks and Regards,
Manoj K
Qcheck
Super Advisor

Re: Can't ping the gateway...

Manoj,

Thank you for the response. Here is the information:

I can't login neither the bond1(10.157.63.101) nor bond2(10.157.120.196) from the putty session. I saved the putty sessions with both the ips saving as mtstalpd-rac4-data(bond1) and mtstalpd-rac4-mgt(bond2).
However, when I was logged into the rac3 node, then I am able to ssh to 10.157.120.196 but not 10.157.63.101

On Friday, cluster nodes have been rebooted.

************Not WORKING NODE ****************
[root@mtstalpd-rac4 standby]# ip addr
1: lo: mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc pfifo_fast master bond1 qlen 1000
link/ether 00:1e:68:78:aa:50 brd ff:ff:ff:ff:ff:ff
inet6 fe80::21e:68ff:fe78:aa50/64 scope link
valid_lft forever preferred_lft forever
3: eth1: mtu 1500 qdisc pfifo_fast master bond2 qlen 1000
link/ether 00:1e:68:78:aa:51 brd ff:ff:ff:ff:ff:ff
inet6 fe80::21e:68ff:fe78:aa51/64 scope link
valid_lft forever preferred_lft forever
4: eth2: mtu 1500 qdisc pfifo_fast master bond1 qlen 1000
link/ether 00:1e:68:78:aa:50 brd ff:ff:ff:ff:ff:ff
inet6 fe80::21e:68ff:fe78:aa50/64 scope link
valid_lft forever preferred_lft forever
5: eth3: mtu 1500 qdisc pfifo_fast master bond2 qlen 1000
link/ether 00:1e:68:78:aa:51 brd ff:ff:ff:ff:ff:ff
inet6 fe80::21e:68ff:fe78:aa51/64 scope link
valid_lft forever preferred_lft forever
6: sit0: mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
7: ib0: mtu 65520 qdisc pfifo_fast master bond0 qlen 256
link/infiniband 80:00:04:04:fe:80:00:00:00:00:00:00:00:06:6a:00:a0:00:fc:9f brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet6 fe80::206:6a00:a000:fc9f/64 scope link
valid_lft forever preferred_lft forever
8: ib1: mtu 65520 qdisc pfifo_fast master bond0 qlen 256
link/infiniband 80:00:04:05:fe:80:00:00:00:00:00:00:00:06:6a:01:a0:00:fc:9f brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet6 fe80::206:6a01:a000:fc9f/64 scope link
valid_lft forever preferred_lft forever
9: bond0: mtu 65520 qdisc noqueue
link/infiniband 80:00:04:04:fe:80:00:00:00:00:00:00:00:06:6a:00:a0:00:fc:9f brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 10.218.22.197/27 brd 10.218.22.223 scope global bond0
inet6 fe80::206:6a00:a000:fc9f/64 scope link
valid_lft forever preferred_lft forever
10: bond1: mtu 1500 qdisc noqueue
link/ether 00:1e:68:78:aa:50 brd ff:ff:ff:ff:ff:ff
inet 10.157.63.101/24 brd 10.157.63.255 scope global bond1
inet 10.157.63.97/24 brd 10.157.63.255 scope global secondary bond1:5
inet 10.157.63.96/24 brd 10.157.63.255 scope global secondary bond1:1
inet6 fe80::21e:68ff:fe78:aa50/64 scope link
valid_lft forever preferred_lft forever
11: bond2: mtu 1500 qdisc noqueue
link/ether 00:1e:68:78:aa:51 brd ff:ff:ff:ff:ff:ff
inet 10.157.120.196/25 brd 10.157.120.255 scope global bond2
inet6 fe80::21e:68ff:fe78:aa51/64 scope link
valid_lft forever preferred_lft forever
[root@mtstalpd-rac4 standby]#

[root@mtstalpd-rac4 standby]# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
10.218.22.192 0.0.0.0 255.255.255.224 U 0 0 0 bond0
10.157.120.128 0.0.0.0 255.255.255.128 U 0 0 0 bond2
10.157.63.0 0.0.0.0 255.255.255.0 U 0 0 0 bond1
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 bond2
0.0.0.0 10.157.63.1 0.0.0.0 UG 0 0 0 bond1
[root@mtstalpd-rac4 standby]#
[root@mtstalpd-rac4 standby]# cat /etc/resolv.conf
domain nycnet
nameserver 10.217.255.161
nameserver 10.136.8.21
nameserver 10.152.8.5
search nycnet doitt.nycnet nyc.gov
[root@mtstalpd-rac4 standby]#

**********WORKING NODE****************

[root@mtstalpd-rac3 oracle]# ip addr
1: lo: mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc pfifo_fast master bond1 qlen 1000
link/ether 00:1e:68:c6:03:46 brd ff:ff:ff:ff:ff:ff
inet6 fe80::21e:68ff:fec6:346/64 scope link
valid_lft forever preferred_lft forever
3: eth1: mtu 1500 qdisc pfifo_fast master bond2 qlen 1000
link/ether 00:1e:68:c6:03:47 brd ff:ff:ff:ff:ff:ff
inet6 fe80::21e:68ff:fec6:347/64 scope link
valid_lft forever preferred_lft forever
4: eth2: mtu 1500 qdisc pfifo_fast master bond1 qlen 1000
link/ether 00:1e:68:c6:03:46 brd ff:ff:ff:ff:ff:ff
inet6 fe80::21e:68ff:fec6:346/64 scope link
valid_lft forever preferred_lft forever
5: eth3: mtu 1500 qdisc pfifo_fast master bond2 qlen 1000
link/ether 00:1e:68:c6:03:47 brd ff:ff:ff:ff:ff:ff
inet6 fe80::21e:68ff:fec6:347/64 scope link
valid_lft forever preferred_lft forever
6: sit0: mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
7: ib0: mtu 65520 qdisc pfifo_fast master bond0 qlen 256
link/infiniband 80:00:04:04:fe:80:00:00:00:00:00:00:00:06:6a:00:a0:00:84:75 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet6 fe80::206:6a00:a000:8475/64 scope link
valid_lft forever preferred_lft forever
8: ib1: mtu 65520 qdisc pfifo_fast master bond0 qlen 256
link/infiniband 80:00:04:05:fe:80:00:00:00:00:00:00:00:06:6a:01:a0:00:84:75 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet6 fe80::206:6a01:a000:8475/64 scope link
valid_lft forever preferred_lft forever
9: bond0: mtu 65520 qdisc noqueue
link/infiniband 80:00:04:04:fe:80:00:00:00:00:00:00:00:06:6a:00:a0:00:84:75 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 10.218.22.196/27 brd 10.218.22.223 scope global bond0
inet6 fe80::206:6a00:a000:8475/64 scope link
valid_lft forever preferred_lft forever
10: bond1: mtu 1500 qdisc noqueue
link/ether 00:1e:68:c6:03:46 brd ff:ff:ff:ff:ff:ff
inet 10.157.63.100/24 brd 10.157.63.255 scope global bond1
inet6 fe80::21e:68ff:fec6:346/64 scope link
valid_lft forever preferred_lft forever
11: bond2: mtu 1500 qdisc noqueue
link/ether 00:1e:68:c6:03:47 brd ff:ff:ff:ff:ff:ff
inet 10.157.120.195/25 brd 10.157.120.255 scope global bond2
inet6 fe80::21e:68ff:fec6:347/64 scope link tentative
valid_lft forever preferred_lft forever
[root@mtstalpd-rac3 oracle]# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
10.218.22.192 0.0.0.0 255.255.255.224 U 0 0 0 bond0
10.157.120.128 0.0.0.0 255.255.255.128 U 0 0 0 bond2
10.157.63.0 0.0.0.0 255.255.255.0 U 0 0 0 bond1
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 bond2
0.0.0.0 10.157.63.1 0.0.0.0 UG 0 0 0 bond1
[root@mtstalpd-rac3 oracle]#
[root@mtstalpd-rac3 oracle]# cat /etc/resolv.conf
domain nycnet
nameserver 10.217.255.161
nameserver 10.136.8.21
nameserver 10.152.8.5
search nycnet doitt.nycnet nyc.gov
[root@mtstalpd-rac3 oracle]#

*********************************************

Thank you.
ManojK_1
Valued Contributor

Re: Can't ping the gateway...

Hi,

So it is clear that the issue got started after rebooting the servers.

Are you trying ssh to the IP Address or hostname?
Are you able to do a "nslookup " from each server with the hostname and IP Address in between?
There is any firewall activated in OS level? please check firewall status and share the output of "iptables -L" and "cat /etc/nsswitch.conf"
Also check any physical firewall is configured in your environment?

Test the ssh (port 22)reachability by using the command "telnet 22" in between the servers.
eg:
from rac3 node run the command "telnet 10.157.63.101 22" and paste the outpt.

You can check the network slowness by the command "ping -s 128 " in between the servers and check the response time.
The response time (time=) should be in ms.

Manoj K
Thanks and Regards,
Manoj K
ManojK_1
Valued Contributor

Re: Can't ping the gateway...

sorry, there is a typing mistake in my previous post.

You can check the network slowness by the command "ping -s 128 " in between the servers and check the response time.
The response time (time=) should be less than 1 ms.

Try to ping also from the system from where you are trying to connect through putty.

Manoj K
Thanks and Regards,
Manoj K
Elmar P. Kolkman
Honored Contributor

Re: Can't ping the gateway...

This looks like a problem between switch and server configuration.

How is your bond configured on the server and how is it configured on the switches?
It looks like one is using active/active while the other is using active/passive.
Then it depends on things like MAC or IP addresses what is reachable and what not... because only half the trafic is routed through the right interface.
Every problem has at least one solution. Only some solutions are harder to find.
Qcheck
Super Advisor

Re: Can't ping the gateway...

Manoj,

Thank you for the response.

******************** NOT WORKING NODE ****************************
[root@mtstalpd-rac4 ~]# nslookup mtstalpd-rac3
;; connection timed out; no servers could be reached

[root@mtstalpd-rac4 ~]# nslookup mtstalpd-rac4
;; connection timed out; no servers could be reached

[root@mtstalpd-rac4 ~]# nslookup 10.157.120.196
;; connection timed out; no servers could be reached

[root@mtstalpd-rac4 ~]#
[root@mtstalpd-rac4 sysconfig]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.218.22.192 0.0.0.0 255.255.255.224 U 0 0 0 bond0
10.157.120.128 0.0.0.0 255.255.255.128 U 0 0 0 bond2
10.157.63.0 0.0.0.0 255.255.255.0 U 0 0 0 bond1
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 bond2
0.0.0.0 10.157.63.1 0.0.0.0 UG 0 0 0 bond1
[root@mtstalpd-rac4 sysconfig]# ping 10.157.63.1
PING 10.157.63.1 (10.157.63.1) 56(84) bytes of data.
From 10.157.63.101 icmp_seq=1 Destination Host Unreachable
From 10.157.63.101 icmp_seq=2 Destination Host Unreachable
From 10.157.63.101 icmp_seq=3 Destination Host Unreachable

--- 10.157.63.1 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3001ms
, pipe 3
[root@mtstalpd-rac4 sysconfig]# ping www.yahoo.com

[root@mtstalpd-rac4 sysconfig]# nslookup www.yahoo.com
;; connection timed out; no servers could be reached

[root@mtstalpd-rac4 sysconfig]#



******************************** WORKING NODE *************************
[root@mtstalpd-rac3 ~]# nslookup mtstalpd-rac3
Server: 10.217.255.161
Address: 10.217.255.161#53

Name: mtstalpd-rac3.nycnet
Address: 10.157.63.100

[root@mtstalpd-rac3 ~]# rac4
ssh: connect to host mtstalpd-rac4 port 22: No route to host
[root@mtstalpd-rac3 ~]# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination
[root@mtstalpd-rac3 ~]# cat /etc/nsswitch.conf
#
# /etc/nsswitch.conf
#
# An example Name Service Switch config file. This file should be
# sorted with the most-used services at the beginning.
#
# The entry '[NOTFOUND=return]' means that the search for an
# entry should stop if the search in the previous entry turned
# up nothing. Note that if the search failed due to some other reason
# (like no NIS server responding) then the search continues with the
# next entry.
#
# Legal entries are:
#
# nisplus or nis+ Use NIS+ (NIS version 3)
# nis or yp Use NIS (NIS version 2), also called YP
# dns Use DNS (Domain Name Service)
# files Use the local files
# db Use the local database (.db) files
# compat Use NIS on compat mode
# hesiod Use Hesiod for user lookups
# [NOTFOUND=return] Stop searching if not found so far
#

# To use db, put the "db" in front of "files" for entries you want to be
# looked up first in the databases
#
# Example:
#passwd: db files nisplus nis
#shadow: db files nisplus nis
#group: db files nisplus nis

passwd: files
shadow: files
group: files

#hosts: db files nisplus nis dns
hosts: files dns

# Example - obey only what nisplus tells us...
#services: nisplus [NOTFOUND=return] files
#networks: nisplus [NOTFOUND=return] files
#protocols: nisplus [NOTFOUND=return] files
#rpc: nisplus [NOTFOUND=return] files
#ethers: nisplus [NOTFOUND=return] files
#netmasks: nisplus [NOTFOUND=return] files

bootparams: nisplus [NOTFOUND=return] files

ethers: files
netmasks: files
networks: files
protocols: files
rpc: files
services: files

netgroup: nisplus

publickey: nisplus

automount: files nisplus
aliases: files nisplus

[root@mtstalpd-rac3 ~]# telnet 10.157.120.196 22
Trying 10.157.120.196...
Connected to mtstalpd-racm4 (10.157.120.196).
Escape character is '^]'.
SSH-2.0-OpenSSH_4.3
^]
Protocol mismatch.
Connection closed by foreign host.
[root@mtstalpd-rac3 ~]# telnet 10.157.63.101 22
Trying 10.157.63.101...
telnet: connect to address 10.157.63.101: No route to host
telnet: Unable to connect to remote host: No route to host
[root@mtstalpd-rac3 ~]# ping -s 128 mtstalpd-rac4
PING mtstalpd-rac4 (10.157.63.101) 128(156) bytes of data.
From mtstalpd-rac3 (10.157.63.100) icmp_seq=2 Destination Host Unreachable
From mtstalpd-rac3 (10.157.63.100) icmp_seq=3 Destination Host Unreachable
From mtstalpd-rac3 (10.157.63.100) icmp_seq=4 Destination Host Unreachable

--- mtstalpd-rac4 ping statistics ---
7 packets transmitted, 0 received, +3 errors, 100% packet loss, time 6000ms
, pipe 3
[root@mtstalpd-rac3 ~]#
[root@mtstalpd-rac3 sysconfig]# nslookup www.yahoo.com
Server: 10.217.255.161
Address: 10.217.255.161#53

Non-authoritative answer:
www.yahoo.com canonical name = fp.wg1.b.yahoo.com.
fp.wg1.b.yahoo.com canonical name = any-fp.wa1.b.yahoo.com.
Name: any-fp.wa1.b.yahoo.com
Address: 67.195.160.76
Name: any-fp.wa1.b.yahoo.com
Address: 69.147.125.65

[root@mtstalpd-rac3 sysconfig]#
*********************************************

Qcheck
Super Advisor

Re: Can't ping the gateway...

Elmar, Thank you for the response.

I am not sure how it is configured on switch but I configured on the server as active-backup policy(mode1) for both bond1(eth0+eth2) and bond2(eth1+eth3).

So for the network(switch) team, what should I ask to check on their side?
Qcheck
Super Advisor

Re: Can't ping the gateway...

How are you trying to ping the gateway? Are you using the IP address or the hostname? If you are using the hostname, is the /etc/resolv.conf file set up the same way on all servers? What about /etc/nsswitch.conf?

Patrick, I am trying to ping the gateway with the IP address. All the /etc/hosts, /etc/resolv.conf and network-scripts are configured the same way but only two out of four racs in the cluster having the issue of not able to ping the gateway. Also can't nslookup:
#nslookup mstalpd-rac4
;; connection timed out; no servers could be reached

Even with the IP address, the nslookup doesn't work. Driving me crazy.......
ManojK_1
Valued Contributor

Re: Can't ping the gateway...

Hi Qcheck,

Is it posssible for you to break bond1 and assign the same ip adddress to eth0 or eth2 and try ping & ssh.

This is to check whether there is a problem with bonding.

Manoj K
Thanks and Regards,
Manoj K
Qcheck
Super Advisor

Re: Can't ping the gateway...

Manoj,

Thank you for the response again, means a lot to me.

If I break the bonding and just use the nic cards, I don't get the duplicate address messages but still ssh is slow. So definitely, something to do with the bonding.

So here, I have two issues.
1) Can't ping the gateway and nslookup doesn't work, and probably that is the reason of slowness.
2) I get the duplicate address detected messages, when I use the bonding and don't get them when I break the bonding.

Also, why I am getting the following messages when I use the bonding, also do I need put max_bonds option?:

Aug 11 15:48:34 mtstalpd-rac4 kernel: bonding: bond0: Warning: The first slave device you specified does not support setting the MAC address. This bond MAC address would be that of the active slave.
Aug 11 15:48:34 mtstalpd-rac4 kernel: bonding: bond0: Warning: enslaved VLAN challenged slave ib1. Adding VLANs will be blocked as long as ib1 is part of bond bond0
Aug 11 15:55:35 mtstalpd-rac4 kernel: bonding: bond1: Warning: the permanent HWaddr of eth0 - 00:1E:68:78:AA:50 - is still in use by bond1. Set the HWaddr of eth0 to a different address to avoid conflicts.
Aug 11 15:55:35 mtstalpd-rac4 kernel: bonding: bond2: Warning: the permanent HWaddr of eth1 - 00:1E:68:78:AA:51 - is still in use by bond2. Set the HWaddr of eth1 to a different address to avoid conflicts.
Aug 11 16:02:20 mtstalpd-rac4 kernel: bonding: bond1: Warning: the permanent HWaddr of eth0 - 00:1E:68:78:AA:50 - is still in use by bond1. Set the HWaddr of eth0 to a different address to avoid conflicts.
Aug 11 16:02:20 mtstalpd-rac4 kernel: bonding: bond2: Warning: the permanent HWaddr of eth1 - 00:1E:68:78:AA:51 - is still in use by bond2. Set the HWaddr of eth1 to a different address to avoid conflicts.
Aug 12 07:55:04 mtstalpd-rac4 kernel: bonding: bond1: Warning: the permanent HWaddr of eth0 - 00:1E:68:78:AA:50 - is still in use by bond1. Set the HWaddr of eth0 to a different address to avoid conflicts.
Aug 12 07:55:04 mtstalpd-rac4 kernel: bonding: bond2: Warning: the permanent HWaddr of eth1 - 00:1E:68:78:AA:51 - is still in use by bond2. Set the HWaddr of eth1 to a different address to avoid conflicts.
[root@mtstalpd-rac4 ~]#

Qcheck
Super Advisor

Re: Can't ping the gateway...

The following problem along with duplicate address detected gone away, by adding the speed, IPV6INIT=no and PEERDNS=yes in ifcfg-ethx scripts.
Aug 12 07:55:04 mtstalpd-rac4 kernel: bonding: bond2: Warning: the permanent HWaddr of eth1 - 00:1E:68:78:AA:51 - is still in use by bond2. Set the HWaddr of eth1 to a different address to avoid conflicts

However still the slowness, unable to ping the gateway and nslookup doesn't work and also can't ping the /etc/resolv.conf ipaddresses.

I noticed the following:
1) When I type route command, it hangs at the point of default gateway and eventually the prompts come back.
2) netstat -rn(works fine, that means not using DNS) and netstat -r hangs the same way like route.
3) The ping gives DUP! for the following:
rac1=10.157.63.98
rac2=10.157.63.99
rac3=10.157.63.100
rac4=10.157.63.101

[root@mtstalpd-rac4 network-scripts]# ping 10.157.63.98
PING 10.157.63.98 (10.157.63.98) 56(84) bytes of data.
From 10.157.63.101 icmp_seq=2 Destination Host Unreachable
From 10.157.63.101 icmp_seq=3 Destination Host Unreachable
From 10.157.63.101 icmp_seq=4 Destination Host Unreachable
64 bytes from 10.157.63.98: icmp_seq=9 ttl=64 time=0.157 ms
64 bytes from 10.157.63.98: icmp_seq=12 ttl=64 time=0.120 ms

--- 10.157.63.98 ping statistics ---
47 packets transmitted, 2 received, +3 errors, 95% packet loss, time 46001ms
rtt min/avg/max/mdev = 0.120/0.138/0.157/0.021 ms, pipe 3
[root@mtstalpd-rac4 network-scripts]# ping 10.157.63.99
PING 10.157.63.99 (10.157.63.99) 56(84) bytes of data.
64 bytes from 10.157.63.99: icmp_seq=2 ttl=64 time=1.24 ms
64 bytes from 10.157.63.99: icmp_seq=2 ttl=64 time=1.27 ms (DUP!)
64 bytes from 10.157.63.99: icmp_seq=3 ttl=64 time=0.104 ms
64 bytes from 10.157.63.99: icmp_seq=3 ttl=64 time=0.139 ms (DUP!)
64 bytes from 10.157.63.99: icmp_seq=4 ttl=64 time=0.144 ms
64 bytes from 10.157.63.99: icmp_seq=4 ttl=64 time=0.168 ms (DUP!)

--- 10.157.63.99 ping statistics ---
4 packets transmitted, 3 received, +3 duplicates, 25% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.104/0.512/1.270/0.529 ms
[root@mtstalpd-rac4 network-scripts]# ping 10.157.63.100
PING 10.157.63.100 (10.157.63.100) 56(84) bytes of data.
64 bytes from 10.157.63.100: icmp_seq=4 ttl=64 time=0.119 ms
64 bytes from 10.157.63.100: icmp_seq=4 ttl=64 time=0.143 ms (DUP!)
64 bytes from 10.157.63.100: icmp_seq=6 ttl=64 time=0.091 ms
64 bytes from 10.157.63.100: icmp_seq=8 ttl=64 time=0.082 ms
64 bytes from 10.157.63.100: icmp_seq=8 ttl=64 time=0.105 ms (DUP!)
64 bytes from 10.157.63.100: icmp_seq=9 ttl=64 time=0.106 ms
64 bytes from 10.157.63.100: icmp_seq=9 ttl=64 time=0.118 ms (DUP!)

--- 10.157.63.100 ping statistics ---
9 packets transmitted, 4 received, +3 duplicates, 55% packet loss, time 8000ms
rtt min/avg/max/mdev = 0.082/0.109/0.143/0.019 ms
[root@mtstalpd-rac4 network-scripts]# ping 10.157.63.101
PING 10.157.63.101 (10.157.63.101) 56(84) bytes of data.
64 bytes from 10.157.63.101: icmp_seq=1 ttl=64 time=0.030 ms
64 bytes from 10.157.63.101: icmp_seq=2 ttl=64 time=0.009 ms
64 bytes from 10.157.63.101: icmp_seq=3 ttl=64 time=0.013 ms
64 bytes from 10.157.63.101: icmp_seq=4 ttl=64 time=0.008 ms
64 bytes from 10.157.63.101: icmp_seq=5 ttl=64 time=0.010 ms

--- 10.157.63.101 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4000ms
rtt min/avg/max/mdev = 0.008/0.014/0.030/0.008 ms
[root@mtstalpd-rac4 network-scripts]#
Elmar P. Kolkman
Honored Contributor

Re: Can't ping the gateway...

Qckeck, what you need to ask the network-guys is how they have configured their channel... It should be active/passive or whatever they call it...
What I see from above output it seems it is now active/active on the switch side. With your bond-setup, that means that all trafic from 1 of the interfaces was dropped on your linux box.

If you are going for the active/active setup, also use lacp to make sure that a half-broken link is detected and ignored for trafic. There are some good documents on the net about setting this up (just google for 'linux lacp').
Every problem has at least one solution. Only some solutions are harder to find.
ManojK_1
Valued Contributor

Re: Can't ping the gateway...

Hi QCheck,

I have done bonding in my production RedHat Linux systems with active/passive configuration.I didn't done any changes in swicth side.

I have done changes in swicth side while doing APA (Teaming in HP Unix) as recommended by HP.

I will paste the output of ifcfg files for your reference from my server. Can you please check the same in problematic systems and compare it with your good system.

Also verify/compare /etc/modprobe.conf.

Manoj K
Thanks and Regards,
Manoj K
ManojK_1
Valued Contributor

Re: Can't ping the gateway...

Hii Qcheck,

Also go through the below urls. It might help you to have some hint.

http://www.linuxquestions.org/questions/linux-networking-3/channel-bonding-dup-packets-received-from-ping-314878/

check the sction "Duplicated Incoming Packets" in the below url.
http://www.linuxfoundation.org/collaborate/workgroups/networking/bonding#Duplicated_Incoming_Packets.

Manoj K
Thanks and Regards,
Manoj K
Qcheck
Super Advisor

Re: Can't ping the gateway...

Manoj,

Thank you for all your responses, time and knowledge sharing.

Here's how problem has been resolved:
----------------------------------------
1) For duplicate address detection messages in /var/log/messages, this is the RHEL 5.1 bug and has been fixed in 5.3, so those messages can't be ignored as every thing else is working fine.

2) For the slowness/accessibility/can't ping the gateway/can't nslookup issues, the problem has been resolved after removing the gateway information in network file and added in ifcfg-bond* scripts and also removing the network and broadcast information in the ifcfg-bond* scripts.

3) For the dup! replies from ping command, the bonding mode has been changed from 0 to 1, where active-backup policy is supported by the switch/hardware.

I posted this, thinking it might help someone.
Luigi Berengan
Advisor

Re: Can't ping the gateway...

Yes, it help me.
Thank you very-very-very much
berlui
Italy