Operating System - Linux
1752790 Members
6407 Online
108789 Solutions
New Discussion юеВ

Re: Can't ping the gateway...

 
SOLVED
Go to solution
ManojK_1
Valued Contributor

Re: Can't ping the gateway...

Hi,

So it is clear that the issue got started after rebooting the servers.

Are you trying ssh to the IP Address or hostname?
Are you able to do a "nslookup " from each server with the hostname and IP Address in between?
There is any firewall activated in OS level? please check firewall status and share the output of "iptables -L" and "cat /etc/nsswitch.conf"
Also check any physical firewall is configured in your environment?

Test the ssh (port 22)reachability by using the command "telnet 22" in between the servers.
eg:
from rac3 node run the command "telnet 10.157.63.101 22" and paste the outpt.

You can check the network slowness by the command "ping -s 128 " in between the servers and check the response time.
The response time (time=) should be in ms.

Manoj K
Thanks and Regards,
Manoj K
ManojK_1
Valued Contributor

Re: Can't ping the gateway...

sorry, there is a typing mistake in my previous post.

You can check the network slowness by the command "ping -s 128 " in between the servers and check the response time.
The response time (time=) should be less than 1 ms.

Try to ping also from the system from where you are trying to connect through putty.

Manoj K
Thanks and Regards,
Manoj K
Elmar P. Kolkman
Honored Contributor

Re: Can't ping the gateway...

This looks like a problem between switch and server configuration.

How is your bond configured on the server and how is it configured on the switches?
It looks like one is using active/active while the other is using active/passive.
Then it depends on things like MAC or IP addresses what is reachable and what not... because only half the trafic is routed through the right interface.
Every problem has at least one solution. Only some solutions are harder to find.
Qcheck
Super Advisor

Re: Can't ping the gateway...

Manoj,

Thank you for the response.

******************** NOT WORKING NODE ****************************
[root@mtstalpd-rac4 ~]# nslookup mtstalpd-rac3
;; connection timed out; no servers could be reached

[root@mtstalpd-rac4 ~]# nslookup mtstalpd-rac4
;; connection timed out; no servers could be reached

[root@mtstalpd-rac4 ~]# nslookup 10.157.120.196
;; connection timed out; no servers could be reached

[root@mtstalpd-rac4 ~]#
[root@mtstalpd-rac4 sysconfig]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.218.22.192 0.0.0.0 255.255.255.224 U 0 0 0 bond0
10.157.120.128 0.0.0.0 255.255.255.128 U 0 0 0 bond2
10.157.63.0 0.0.0.0 255.255.255.0 U 0 0 0 bond1
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 bond2
0.0.0.0 10.157.63.1 0.0.0.0 UG 0 0 0 bond1
[root@mtstalpd-rac4 sysconfig]# ping 10.157.63.1
PING 10.157.63.1 (10.157.63.1) 56(84) bytes of data.
From 10.157.63.101 icmp_seq=1 Destination Host Unreachable
From 10.157.63.101 icmp_seq=2 Destination Host Unreachable
From 10.157.63.101 icmp_seq=3 Destination Host Unreachable

--- 10.157.63.1 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3001ms
, pipe 3
[root@mtstalpd-rac4 sysconfig]# ping www.yahoo.com

[root@mtstalpd-rac4 sysconfig]# nslookup www.yahoo.com
;; connection timed out; no servers could be reached

[root@mtstalpd-rac4 sysconfig]#



******************************** WORKING NODE *************************
[root@mtstalpd-rac3 ~]# nslookup mtstalpd-rac3
Server: 10.217.255.161
Address: 10.217.255.161#53

Name: mtstalpd-rac3.nycnet
Address: 10.157.63.100

[root@mtstalpd-rac3 ~]# rac4
ssh: connect to host mtstalpd-rac4 port 22: No route to host
[root@mtstalpd-rac3 ~]# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination
[root@mtstalpd-rac3 ~]# cat /etc/nsswitch.conf
#
# /etc/nsswitch.conf
#
# An example Name Service Switch config file. This file should be
# sorted with the most-used services at the beginning.
#
# The entry '[NOTFOUND=return]' means that the search for an
# entry should stop if the search in the previous entry turned
# up nothing. Note that if the search failed due to some other reason
# (like no NIS server responding) then the search continues with the
# next entry.
#
# Legal entries are:
#
# nisplus or nis+ Use NIS+ (NIS version 3)
# nis or yp Use NIS (NIS version 2), also called YP
# dns Use DNS (Domain Name Service)
# files Use the local files
# db Use the local database (.db) files
# compat Use NIS on compat mode
# hesiod Use Hesiod for user lookups
# [NOTFOUND=return] Stop searching if not found so far
#

# To use db, put the "db" in front of "files" for entries you want to be
# looked up first in the databases
#
# Example:
#passwd: db files nisplus nis
#shadow: db files nisplus nis
#group: db files nisplus nis

passwd: files
shadow: files
group: files

#hosts: db files nisplus nis dns
hosts: files dns

# Example - obey only what nisplus tells us...
#services: nisplus [NOTFOUND=return] files
#networks: nisplus [NOTFOUND=return] files
#protocols: nisplus [NOTFOUND=return] files
#rpc: nisplus [NOTFOUND=return] files
#ethers: nisplus [NOTFOUND=return] files
#netmasks: nisplus [NOTFOUND=return] files

bootparams: nisplus [NOTFOUND=return] files

ethers: files
netmasks: files
networks: files
protocols: files
rpc: files
services: files

netgroup: nisplus

publickey: nisplus

automount: files nisplus
aliases: files nisplus

[root@mtstalpd-rac3 ~]# telnet 10.157.120.196 22
Trying 10.157.120.196...
Connected to mtstalpd-racm4 (10.157.120.196).
Escape character is '^]'.
SSH-2.0-OpenSSH_4.3
^]
Protocol mismatch.
Connection closed by foreign host.
[root@mtstalpd-rac3 ~]# telnet 10.157.63.101 22
Trying 10.157.63.101...
telnet: connect to address 10.157.63.101: No route to host
telnet: Unable to connect to remote host: No route to host
[root@mtstalpd-rac3 ~]# ping -s 128 mtstalpd-rac4
PING mtstalpd-rac4 (10.157.63.101) 128(156) bytes of data.
From mtstalpd-rac3 (10.157.63.100) icmp_seq=2 Destination Host Unreachable
From mtstalpd-rac3 (10.157.63.100) icmp_seq=3 Destination Host Unreachable
From mtstalpd-rac3 (10.157.63.100) icmp_seq=4 Destination Host Unreachable

--- mtstalpd-rac4 ping statistics ---
7 packets transmitted, 0 received, +3 errors, 100% packet loss, time 6000ms
, pipe 3
[root@mtstalpd-rac3 ~]#
[root@mtstalpd-rac3 sysconfig]# nslookup www.yahoo.com
Server: 10.217.255.161
Address: 10.217.255.161#53

Non-authoritative answer:
www.yahoo.com canonical name = fp.wg1.b.yahoo.com.
fp.wg1.b.yahoo.com canonical name = any-fp.wa1.b.yahoo.com.
Name: any-fp.wa1.b.yahoo.com
Address: 67.195.160.76
Name: any-fp.wa1.b.yahoo.com
Address: 69.147.125.65

[root@mtstalpd-rac3 sysconfig]#
*********************************************

Qcheck
Super Advisor

Re: Can't ping the gateway...

Elmar, Thank you for the response.

I am not sure how it is configured on switch but I configured on the server as active-backup policy(mode1) for both bond1(eth0+eth2) and bond2(eth1+eth3).

So for the network(switch) team, what should I ask to check on their side?
Qcheck
Super Advisor

Re: Can't ping the gateway...

How are you trying to ping the gateway? Are you using the IP address or the hostname? If you are using the hostname, is the /etc/resolv.conf file set up the same way on all servers? What about /etc/nsswitch.conf?

Patrick, I am trying to ping the gateway with the IP address. All the /etc/hosts, /etc/resolv.conf and network-scripts are configured the same way but only two out of four racs in the cluster having the issue of not able to ping the gateway. Also can't nslookup:
#nslookup mstalpd-rac4
;; connection timed out; no servers could be reached

Even with the IP address, the nslookup doesn't work. Driving me crazy.......
ManojK_1
Valued Contributor

Re: Can't ping the gateway...

Hi Qcheck,

Is it posssible for you to break bond1 and assign the same ip adddress to eth0 or eth2 and try ping & ssh.

This is to check whether there is a problem with bonding.

Manoj K
Thanks and Regards,
Manoj K
Qcheck
Super Advisor

Re: Can't ping the gateway...

Manoj,

Thank you for the response again, means a lot to me.

If I break the bonding and just use the nic cards, I don't get the duplicate address messages but still ssh is slow. So definitely, something to do with the bonding.

So here, I have two issues.
1) Can't ping the gateway and nslookup doesn't work, and probably that is the reason of slowness.
2) I get the duplicate address detected messages, when I use the bonding and don't get them when I break the bonding.

Also, why I am getting the following messages when I use the bonding, also do I need put max_bonds option?:

Aug 11 15:48:34 mtstalpd-rac4 kernel: bonding: bond0: Warning: The first slave device you specified does not support setting the MAC address. This bond MAC address would be that of the active slave.
Aug 11 15:48:34 mtstalpd-rac4 kernel: bonding: bond0: Warning: enslaved VLAN challenged slave ib1. Adding VLANs will be blocked as long as ib1 is part of bond bond0
Aug 11 15:55:35 mtstalpd-rac4 kernel: bonding: bond1: Warning: the permanent HWaddr of eth0 - 00:1E:68:78:AA:50 - is still in use by bond1. Set the HWaddr of eth0 to a different address to avoid conflicts.
Aug 11 15:55:35 mtstalpd-rac4 kernel: bonding: bond2: Warning: the permanent HWaddr of eth1 - 00:1E:68:78:AA:51 - is still in use by bond2. Set the HWaddr of eth1 to a different address to avoid conflicts.
Aug 11 16:02:20 mtstalpd-rac4 kernel: bonding: bond1: Warning: the permanent HWaddr of eth0 - 00:1E:68:78:AA:50 - is still in use by bond1. Set the HWaddr of eth0 to a different address to avoid conflicts.
Aug 11 16:02:20 mtstalpd-rac4 kernel: bonding: bond2: Warning: the permanent HWaddr of eth1 - 00:1E:68:78:AA:51 - is still in use by bond2. Set the HWaddr of eth1 to a different address to avoid conflicts.
Aug 12 07:55:04 mtstalpd-rac4 kernel: bonding: bond1: Warning: the permanent HWaddr of eth0 - 00:1E:68:78:AA:50 - is still in use by bond1. Set the HWaddr of eth0 to a different address to avoid conflicts.
Aug 12 07:55:04 mtstalpd-rac4 kernel: bonding: bond2: Warning: the permanent HWaddr of eth1 - 00:1E:68:78:AA:51 - is still in use by bond2. Set the HWaddr of eth1 to a different address to avoid conflicts.
[root@mtstalpd-rac4 ~]#

Qcheck
Super Advisor

Re: Can't ping the gateway...

The following problem along with duplicate address detected gone away, by adding the speed, IPV6INIT=no and PEERDNS=yes in ifcfg-ethx scripts.
Aug 12 07:55:04 mtstalpd-rac4 kernel: bonding: bond2: Warning: the permanent HWaddr of eth1 - 00:1E:68:78:AA:51 - is still in use by bond2. Set the HWaddr of eth1 to a different address to avoid conflicts

However still the slowness, unable to ping the gateway and nslookup doesn't work and also can't ping the /etc/resolv.conf ipaddresses.

I noticed the following:
1) When I type route command, it hangs at the point of default gateway and eventually the prompts come back.
2) netstat -rn(works fine, that means not using DNS) and netstat -r hangs the same way like route.
3) The ping gives DUP! for the following:
rac1=10.157.63.98
rac2=10.157.63.99
rac3=10.157.63.100
rac4=10.157.63.101

[root@mtstalpd-rac4 network-scripts]# ping 10.157.63.98
PING 10.157.63.98 (10.157.63.98) 56(84) bytes of data.
From 10.157.63.101 icmp_seq=2 Destination Host Unreachable
From 10.157.63.101 icmp_seq=3 Destination Host Unreachable
From 10.157.63.101 icmp_seq=4 Destination Host Unreachable
64 bytes from 10.157.63.98: icmp_seq=9 ttl=64 time=0.157 ms
64 bytes from 10.157.63.98: icmp_seq=12 ttl=64 time=0.120 ms

--- 10.157.63.98 ping statistics ---
47 packets transmitted, 2 received, +3 errors, 95% packet loss, time 46001ms
rtt min/avg/max/mdev = 0.120/0.138/0.157/0.021 ms, pipe 3
[root@mtstalpd-rac4 network-scripts]# ping 10.157.63.99
PING 10.157.63.99 (10.157.63.99) 56(84) bytes of data.
64 bytes from 10.157.63.99: icmp_seq=2 ttl=64 time=1.24 ms
64 bytes from 10.157.63.99: icmp_seq=2 ttl=64 time=1.27 ms (DUP!)
64 bytes from 10.157.63.99: icmp_seq=3 ttl=64 time=0.104 ms
64 bytes from 10.157.63.99: icmp_seq=3 ttl=64 time=0.139 ms (DUP!)
64 bytes from 10.157.63.99: icmp_seq=4 ttl=64 time=0.144 ms
64 bytes from 10.157.63.99: icmp_seq=4 ttl=64 time=0.168 ms (DUP!)

--- 10.157.63.99 ping statistics ---
4 packets transmitted, 3 received, +3 duplicates, 25% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.104/0.512/1.270/0.529 ms
[root@mtstalpd-rac4 network-scripts]# ping 10.157.63.100
PING 10.157.63.100 (10.157.63.100) 56(84) bytes of data.
64 bytes from 10.157.63.100: icmp_seq=4 ttl=64 time=0.119 ms
64 bytes from 10.157.63.100: icmp_seq=4 ttl=64 time=0.143 ms (DUP!)
64 bytes from 10.157.63.100: icmp_seq=6 ttl=64 time=0.091 ms
64 bytes from 10.157.63.100: icmp_seq=8 ttl=64 time=0.082 ms
64 bytes from 10.157.63.100: icmp_seq=8 ttl=64 time=0.105 ms (DUP!)
64 bytes from 10.157.63.100: icmp_seq=9 ttl=64 time=0.106 ms
64 bytes from 10.157.63.100: icmp_seq=9 ttl=64 time=0.118 ms (DUP!)

--- 10.157.63.100 ping statistics ---
9 packets transmitted, 4 received, +3 duplicates, 55% packet loss, time 8000ms
rtt min/avg/max/mdev = 0.082/0.109/0.143/0.019 ms
[root@mtstalpd-rac4 network-scripts]# ping 10.157.63.101
PING 10.157.63.101 (10.157.63.101) 56(84) bytes of data.
64 bytes from 10.157.63.101: icmp_seq=1 ttl=64 time=0.030 ms
64 bytes from 10.157.63.101: icmp_seq=2 ttl=64 time=0.009 ms
64 bytes from 10.157.63.101: icmp_seq=3 ttl=64 time=0.013 ms
64 bytes from 10.157.63.101: icmp_seq=4 ttl=64 time=0.008 ms
64 bytes from 10.157.63.101: icmp_seq=5 ttl=64 time=0.010 ms

--- 10.157.63.101 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4000ms
rtt min/avg/max/mdev = 0.008/0.014/0.030/0.008 ms
[root@mtstalpd-rac4 network-scripts]#
Elmar P. Kolkman
Honored Contributor

Re: Can't ping the gateway...

Qckeck, what you need to ask the network-guys is how they have configured their channel... It should be active/passive or whatever they call it...
What I see from above output it seems it is now active/active on the switch side. With your bond-setup, that means that all trafic from 1 of the interfaces was dropped on your linux box.

If you are going for the active/active setup, also use lacp to make sure that a half-broken link is detected and ignored for trafic. There are some good documents on the net about setting this up (just google for 'linux lacp').
Every problem has at least one solution. Only some solutions are harder to find.