network routing problems

Jenni Wolgast · ‎10-23-2007

I have 2 HP-UX 11.11 servers that as far as I can tell have similar configuration... Both were working fine (and no config changes have been made in over a month) until Friday night when one started having issues sending email... Saturday night about half the computers on our network could no longer telnet to this server... I talked to our network person and she was able to help me figure out that the subnet mask was wrong (even though it had been working fine for over a month like that) so I fixed that and then the rest of the computers on the network were able to telnet to this server. Mail still wasn't working though and now I have one user that cannot telnet to this system again...

I posted over in the sys admin section about the sendmail problem and someone had me try traceroute from both systems which gave very different results... I am now thinking this is a network config problem rather than a mail config problem so I am hoping someone over here might have some suggestions... Here are the results of a traceroute to the address sendmail uses:

working server:
traceroute healthplus.com.s8a1.psmtp.com
traceroute to healthplus.com.s8a1.psmtp.com (64.18.7.10), 30 hops max, 40 byte p
ackets
1 ASTARO.flintdns (126.1.3.229) 0.293 ms 0.208 ms 0.150 ms
2 10.0.0.1 (10.0.0.1) 0.915 ms * *
3 209-254-57-73.ip.mcleodusa.net (209.254.57.73) 6.905 ms * 8.911 ms
4 * * FSHRINFCH02JP01-SO0-0-0-0.mcleodusa.net (64.198.100.37) 24.427 ms
5 * STLSMOGZH00JC01-SO0-2-0-0.mcleodusa.net (64.198.101.26) 35.455 ms *
6 * * KSCAMO54H00JP01-SO2-2-0-0.mcleodusa.net (64.198.100.170) 33.927 ms
7 * DNVTCOUZH00JC01-SO0-3-0-0.mcleodusa.net (64.198.101.90) 48.157 ms *
8 SNJUCACLH25JC01-SO0-1-1-0.mcleodusa.net (64.198.100.78) 75.986 ms * 76.061
ms
9 * cr1-eqix-peer.sje007.internap.net (206.223.116.134) 76.379 ms 76.454 ms
10 * core4.sje.inappnet-28.cr1.sje007.internap.net (66.79.148.130) 70.111 ms
69.419 ms
11 * border1.pc1-0-bbnet1.sje.pnap.net (66.151.144.4) 69.952 ms *
12 * * *
13 * * *
14 * * *
15 * * *
(I stopped it at this point)

other server:
traceroute healthplus.com.s8a1.psmtp.c>
traceroute to healthplus.com.s8a1.psmtp.com (64.18.7.10), 30 hops max, 40 byte p
ackets
1 DEVUX.flintdns (126.1.3.17) 0.159 ms !N 0.044 ms !N 0.034 ms !N

The first place the working server heads is our firewall server, the non-working server just seems to be looking at itself... How can I get the non-working server to head to the firewall server? netstat -rn lists the firewall server as default on both servers...

Steven E. Protter · ‎10-23-2007

Shalom,

I'd suggest making the default gateway on the non-working server the same as the working one.

/etc/rc.config.d/netconf

It should if I understand the question point to the firewall.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Wouter Jagers · ‎10-23-2007

See http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1171222

It's a good idea to keep this problem in your initial post: the more background info available, the better people will be able to help.

Cheers,
Wout

an engineer's aim in a discussion is not to persuade, but to clarify.

Jenni Wolgast · ‎10-23-2007

default gateway is already the same on both servers...

A. Clay Stephenson · ‎10-23-2007

The most common configuration is that the default router is responsible for routing to the firewall. Since netstat -rn only lists the default router on the "bad" box then that suggests a few things to try --- and I am assuming that you are able to actually ping the default router from this bad box.
1) Is the default router on the "bad" box the same as for the "good" boxes? 2) Is the ROUTE_COUNT[n] value specified in /etc/rc.config.d/netconf for the default route set at 1?

It would help if you posted the output of netstat -rn on the "bad" box.

By the way, the subnet problem you listed earlier that still "works" is not all that uncommon. In fact, it's one of the things I do to torment admins in training. There is nothing like a subnet mask off by 1 bit to drive a jr. admin (or a sr. one that doesn't bother to verify the subnet mask as a matter of course) nuts. Some hosts work fine and others don't and they don't usually see the pattern until the subnet mask and the IP addresses are expressed in binary and then the mental light bulbs illuminate.

If it ain't broke, I can fix that.

Jenni Wolgast · ‎10-23-2007

Working server:
netstat -rn
Routing tables
Destination Gateway Flags Refs Interface Pmtu
127.0.0.1 127.0.0.1 UH 0 lo0 4136
192.168.0.1 192.168.0.1 UH 0 lan1 4136
126.1.3.225 126.1.3.225 UH 0 lan0 4136
192.168.0.0 192.168.0.1 U 2 lan1 1500
126.1.0.0 126.1.3.225 U 2 lan0 1500
127.0.0.0 127.0.0.1 U 0 lo0 0
default 126.1.3.229 UG 0 lan0 0

non-working server:
netstat -rn
Routing tables
Destination Gateway Flags Refs Interface Pmtu
127.0.0.1 127.0.0.1 UH 0 lo0 4136
126.1.3.17 126.1.3.17 UH 0 lan1 4136
126.1.0.0 126.1.3.17 U 2 lan1 1500
127.0.0.0 127.0.0.1 U 0 lo0 0
default 126.1.3.229 UG 0 lan1 0

Sandman! · ‎10-23-2007

Does the non-working server have anything to do with DNS at all? Owing to its hostname being >DEVUX.flintdns<. It is possible that the DNS record for this host got corrupted and created a circular link to itself (may explain why it keeps looking at itself and never moves on to the next hop). Your best bet would be to ask the DNS admin at your site if any changes were made to the DNS system.

~hope it helps

Jenni Wolgast · ‎10-23-2007

The .flintdns just gets added to the end of everything I think, this server doesn't have anything to do with DNS...

Sandman! · ‎10-23-2007

But I have not seen a traceroute where the first hop was the actually the local interface. And that suggests that it could be a DNS issue. You might want to check with your site DNS admin.

Jenni Wolgast · ‎10-23-2007

Is there anything in particular they should be looking for? No other systems on the network are having any issues so she keeps telling me it a setting on my server...

Sandman! · ‎10-23-2007

Show them the output of traceroute on the non-working server. The very first hop is the IP of a local interface (lan1)...is it not? And that is incorrect. The first hop should be to a remote host. Note the first hop of the working server is ASTARO.flintdns which is a remote host.

traceroute to healthplus.com.s8a1.psmtp.com (64.18.7.10), 30 hops max, 40 byte packets
>> 1 DEVUX.flintdns (126.1.3.17) 0.159 ms !N 0.044 ms !N 0.034 ms !N <<

Jenni Wolgast · ‎10-23-2007

I see what the problem is, I just don't know what is causing it... If I show the network admin the traceroute results her response is going to be "well just configure server B like server A"... From her standpoint the 500 PC's and 50 other servers on the network are all working just fine including the other HP-UX server so it must be a problem with this server, not the network...

I need help trying to figure out what the difference is between the two HP-UX boxes that is causing one to work and the other not to work..

Sandman! · ‎10-23-2007

Try pinging ASTARO.flintdns from the non-working server and post the results here. Also post the output of the arp cache on both the good and bad servers.

# ping ASTARO
# arp -an

Patrick Wallek · ‎10-23-2007

You posted:

non-working server:
netstat -rn
Routing tables
Destination Gateway Flags Refs Interface Pmtu
127.0.0.1 127.0.0.1 UH 0 lo0 4136
126.1.3.17 126.1.3.17 UH 0 lan1 4136
126.1.0.0 126.1.3.17 U 2 lan1 1500
127.0.0.0 127.0.0.1 U 0 lo0 0
default 126.1.3.229 UG 0 lan1 0

The one line that I see that looks strange is:
126.1.0.0 126.1.3.17 U 2 lan1 1500

That is saying that anything going to the 126.1.*.* subnet goes through 126.1.3.17, which is the local host.

I think you need to remove that route. If you look on your working machine, the thing that resembles that line the closest is:

126.1.0.0 126.1.3.225 U 2 lan0 1500

I think this probably stems from your incorrect subnet mask issue.

Jenni Wolgast · ‎10-23-2007

The ASTARO (our firewall) server does not allow you to ping it so pinging fails from both servers...

If you look at the netstat -rn for both servers and compare the entries for lan0 on the working server and lan1 on the non-working server they are both set up the same:

netstat -rn
Routing tables
Destination Gateway Flags Refs Interface Pmtu
127.0.0.1 127.0.0.1 UH 0 lo0 4136
126.1.3. 126.1.3. UH 0 lan# 4136
126.1.0.0 126.1.3. U 2 lan# 1500
127.0.0.0 127.0.0.1 U 0 lo0 0
default 126.1.3.229 UG 0 lan# 0

Sandman! · ‎10-23-2007

What about the arp cache?

# arp -an

post above output here

Jenni Wolgast · ‎10-23-2007

There are over a hundred results for each server, is there a particular address you are looking for? They both show the same results for the firewall server...

Sandman! · ‎10-23-2007

Yes that's all I was interested in the MAC address of the ASTARO. Anyway try pinging the good server from the bad one and vice-versa and post the results here.

Jenni Wolgast · ‎10-23-2007

The servers can ping each other just fine but they are both in each other's host files...

Tim Nelson · ‎10-23-2007

Just a note as you mentioned that you cannot ping the router/firewall from this bad host.

I ran into this in a DMZ where the firewall was managed by Intel folk and they turned off ping allowance on the local net to my server.

The default route requires that you have ping access to the default router unless you specifically disable it with an ndd command and make it part of bootup.

Remove the default route and re-add it. If it works for a couple seconds and then you loose access again and traceroutes die with N! then this might be the issue.

Tim Nelson · ‎10-23-2007

Here is a related thread to the ndd setting.

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=200655

ip_ire_gw_probe - Enable / disable dead gateway probes

Sandman! · ‎10-23-2007

Make sure that the nameservers listed in /etc/resolv.conf on both the servers is the same. Also are you willing to start configuring lancard lan1 from scratch i.e.

# route delete default
# ifconfig lan1 unplumb
# ifconfig lan1 plumb
# ifconfig lan1
# route add default

Jenni Wolgast · ‎10-23-2007

I followed that link and tried the ndd -get /dev/ip ip_ire_status command and the last line did say IRE_GATEWAY DEAD on the non-working server... I tried the ndd -set /dev/ip ip_ire_gw_probe 0 command but that doesn't seem to have fixed my problem...

I also saw the command netstat -p icmp in that post so I tried that on both servers and these were the results:

working server
icmp:
40497 calls to generate an ICMP error message
0 ICMP messages dropped
Output histogram:
echo reply: 28765
destination unreachable: 11732
source quench: 0
routing redirect: 0
echo: 0
time exceeded: 0
parameter problem: 0
time stamp: 0
time stamp reply: 0
address mask request: 0
address mask reply: 0
4 bad ICMP messages
Input histogram:
echo reply: 58
destination unreachable: 451
source quench: 0
routing redirect: 173591
echo: 28765
time exceeded: 17
parameter problem: 0
time stamp request: 0
time stamp reply: 0
address mask request: 0
address mask reply: 0
28765 responses sent

non-working server:
icmp:
57854 calls to generate an ICMP error message
5 ICMP messages dropped
Output histogram:
echo reply: 17869
destination unreachable: 39980
source quench: 0
routing redirect: 0
echo: 0
time exceeded: 0
parameter problem: 0
time stamp: 0
time stamp reply: 0
address mask request: 0
address mask reply: 0
0 bad ICMP messages
Input histogram:
echo reply: 16016
destination unreachable: 23311
source quench: 0
routing redirect: 182902
echo: 17869
time exceeded: 0
parameter problem: 0
time stamp request: 0
time stamp reply: 0
address mask request: 0
address mask reply: 0
17869 responses sent

I also just checked the /etc/rc.config.d/nddconf files on both servers and the working server had this section but the non-working server didn't:

TRANSPORT_NAME[5]=ip
NDD_NAME[5]=ip_ire_gw_probe
NDD_VALUE[5]=0

I added it to the non-working server, is there anything I need to do to have the change take effect?

Jenni Wolgast · ‎10-23-2007

Well I tried to remove/re-add the default gateway and that didn't work:

[root@DEVUX]:/ ->route delete default 126.1.3.229
delete net default: gateway 126.1.3.229
[root@DEVUX]:/ ->route add default 126.1.3.229
add net default: gateway 126.1.3.229: Network is unreachable

Jenni Wolgast · ‎10-23-2007

It must have been able to reach the firewall server at some point when it first set it as the default gateway because after I tried to remove/re-add it unsucessfully I no longer have it listed in netstat -rn...

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

network routing problems

network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems

Re: network routing problems