- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- traceroute hangs from one specific server
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2002 02:48 AM
тАО10-28-2002 02:48 AM
traceroute hangs from one specific server
I have the following problem :
From server sv00127
#traceroute xw218
traceroute to xw218.dolmen.be (10.118.112.18), 30 hops max, 20 byte packets
1 ro11-fe400-101.dolmen.be (10.101.1.25) 1 ms 1 ms 1 ms
2 pc3926.dolmen.be (10.102.2.1) 1 ms 1 ms 1 ms
3 rodca.dolmen.be (192.168.30.1) 30 ms 3 ms 2 ms
4 xw218.dolmen.be (10.118.112.18) 3 ms
... hangs, does not return to commandline
From server sv00128
#traceroute xw218
traceroute to xw218.dolmen.be(10.118.112.18), 30 hops max, 20 byte packets
1 ro11-fe400-101.dolmen.be (10.101.1.25) 3 ms 1 ms 1 ms
2 pc3926.dolmen.be (10.102.2.1) 1 ms 1 ms 1 ms
3 rodca.dolmen.be (192.168.30.1) 2 ms 3 ms 2 ms
4 xw218.dolmen.be (10.118.112.18) 3 ms * 11 ms
... times out and returns to commandline, as it should be
As you can see, traceroute hangs on server sv00127 but works from server sv00128 (or any other server I can find here). On top of that, if I reboot server sv00127 (which does not happen often, it's a crucial production server) it works from server sv00127 too ... for a couple of days.
Has anyone noticed this strange behavior before ? Is it patch related (I could not find anything related ... but I might have overlooked). It might be usefull to know that sv00127 is the primary DNS server (but sv00128 is secondary so that shouldn't have any impact).
The networkguys are getting annoyed with this (and than they turn around and annoy me :-).
Thanks in advance,
Tom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2002 03:16 AM
тАО10-28-2002 03:16 AM
Re: traceroute hangs from one specific server
The problem is with the DNS . If traceroute cannot inverse lookup it hangs. To identify the
problem.
Give
#traceroute -n xw218
It will not hang now.
regards,
U.SivaKumar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2002 03:32 AM
тАО10-28-2002 03:32 AM
Re: traceroute hangs from one specific server
To solve the DNS problem
Compare the /etc/resolv.conf and /etc/nsswitch.conf file in both servers. Use Same files of good server in problematic server.
regards,
U.SivaKumar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2002 03:44 AM
тАО10-28-2002 03:44 AM
Re: traceroute hangs from one specific server
From server sv00127
#traceroute -n xw218
traceroute to xw218.dolmen.be (10.118.112.18), 30 hops max, 20 byte packets
1 10.101.1.25 1 ms 1 ms 1 ms
2 10.102.2.1 7 ms 1 ms 1 ms
3 192.168.30.1 2 ms 2 ms 2 ms
4 10.118.112.18 3 ms
... and hangs again
Regards,
Tom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2002 11:48 AM
тАО10-28-2002 11:48 AM
Re: traceroute hangs from one specific server
Try to look for this patch
PHNE_23274
check if it is worth installing on your system. If it is, then go ahead as this patch resolves some nslookup issues and would help you in name resolution. It can hang due to that.
Regards,
Anil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2002 08:08 PM
тАО10-28-2002 08:08 PM
Re: traceroute hangs from one specific server
The fact that it does not return a star indicates that it's probably a software problem so I'd go with the patch idea.
However, just for grins, what does the traceroute in the other direction show? From xw218 back to sv00127 and sv00128? Does ping work? Is xw218 a Cisco device? Does traceroute -dv xw218etc give you any extra info?
Ron
PS Anybody know why there is no man page for traceroute on hpux?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-29-2002 06:31 AM
тАО10-29-2002 06:31 AM
Re: traceroute hangs from one specific server
I'm going to try PHNE_23274. However, since this is a very important production server, this is going to take a while (first have to go through test / acceptation). On top of that, sv00128 doesn't have that patch either ... so allow me to be a bit sceptical (even though the patch has three stars and sounds like the thing).
If it solves the problem ... I'll give you full marks !
Ron,
You definitely got a grin ... the problem is that xw218 is down most of the time (reason that the traceroute doesn't complete ;-) ... so I can't test your suggestions.
I've wondered about the missing manpage as well, so any answers on that will also get points (although I'll keep 'm back until the real issue is solved).
Regards,
Tom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-29-2002 08:25 AM
тАО10-29-2002 08:25 AM
Re: traceroute hangs from one specific server
After all, we all are here to help. If it doesn't open a call with hp and that would help you in all probabilities.
Regards,
Anil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-29-2002 11:42 AM
тАО10-29-2002 11:42 AM
Re: traceroute hangs from one specific server
then you can simply look at the last system call(s) it makes and that may yield a clue as to where it is getting hung-up.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-30-2002 12:26 AM
тАО10-30-2002 12:26 AM
Re: traceroute hangs from one specific server
I've attached the output from tusc (attached at the moment traceroute hangs) ... it's Chinese to me. Can you/anyone else interpret
the output ?
Regards,
Tom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-30-2002 04:29 AM
тАО10-30-2002 04:29 AM
Re: traceroute hangs from one specific server
Regards,
Tom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-30-2002 05:48 AM
тАО10-30-2002 05:48 AM
Re: traceroute hangs from one specific server
From your tusc output it is not stuck , but still working.
Have a look at the ip address in fields "sin_addr.s_addr" and see if they can point you to where the problem is.
Have you compared your routing tables ?
Paula
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-30-2002 10:02 AM
тАО10-30-2002 10:02 AM
Re: traceroute hangs from one specific server
traceroute -w 2 xw218
see if changing the wait time to the minimum helps any.
If you trace to an unused address on the same LAN as xw218 do you get the same hang?
Compare the output of
ping -o xw218
(after you stop it)
from both boxes.
Ron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-30-2002 10:23 AM
тАО10-30-2002 10:23 AM
Re: traceroute hangs from one specific server
iirc, the -E option to tusc will show both entry and exit. You might also add a '-T ""' to the tusc command lines.
tusc does not seem to break-out the timeval struct in the select call, so I cannot see what is being passed-in for the timeout.
there is one oddity in the trace however - there are no sendto() or write() calls for each packet that is supposed to be triggering the ICMP's from the remote, and also some set/getsockopts related to setting the TTL in the ip header and such.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-30-2002 10:33 PM
тАО10-30-2002 10:33 PM
Re: traceroute hangs from one specific server
The problem seems to be that traceroute does not work (or loops, or searches the whole network, or ...) when a certain device is down.
Paula,
Routing is the same for both servers ...
The IP's in the output seem to be all the routers we've got over here (and we've got a couple ;-). It's not quite clear - for me - why all those IP's are in there.
Ron,
xw218 is down, ping -o doesn't work ...
Rick,
root/sv00127#/opt/tusc/bin/tusc -T "" -E traceroute xw218 > /var/adm/crash/tusc_xw218.txt
# Output in attachment (gzipped)
Regards,
Tom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-30-2002 10:46 PM
тАО10-30-2002 10:46 PM
Re: traceroute hangs from one specific server
Again
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-31-2002 04:49 AM
тАО10-31-2002 04:49 AM
Re: traceroute hangs from one specific server
New development. It would seem that Paula is on the right track, traceroute on sv00127 does not hang but takes a lot longer and the duration time seems to fluctuate. Right now it takes longer ... but does end in a reasonable time.
In attachment (first one in this reply, second one in the next)the complete tusc-traces for both sv00127 and sv00128. Does anyone "see" what's wrong with the first one ?
Regards,
Tom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-31-2002 04:57 AM
тАО10-31-2002 04:57 AM
Re: traceroute hangs from one specific server
Regards,
Tom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-31-2002 05:04 AM
тАО10-31-2002 05:04 AM
Re: traceroute hangs from one specific server
seems like that your n/w card is half duplex.
make it full duplex,
sam--> n/w and commn--> n/w interface--> choose your LAN card --> action-->modify
you may have to off the auto negotiate.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-31-2002 05:09 AM
тАО10-31-2002 05:09 AM
Re: traceroute hangs from one specific server
root/sv00127#lanadmin -x 0
Current Speed = 100 Full-Duplex Auto-Negotiation-OFF
root/sv00128#lanadmin -x 0
Current Speed = 100 Full-Duplex Auto-Negotiation-OFF
Seems to me like it's Full Duplex though ... or am I missing something ?
Regards,
Tom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-31-2002 06:24 AM
тАО10-31-2002 06:24 AM
Re: traceroute hangs from one specific server
Can you diff the two outputs from tusc and post the results.
Paula
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-31-2002 06:45 AM
тАО10-31-2002 06:45 AM
Re: traceroute hangs from one specific server
Forget the diff - I have done it:-
Things to look at :-
/etc/nsswitch.conf
Also st_mtime is showing differances of the servers - Are they patched the same?
root@d370/>diff 332.txt 322.txt | grep st_mt | grep 2000
< st_mtime: Fri Jan 7 01:09:53 2000
< st_mtime: Sat Jul 8 00:55:00 2000
< st_mtime: Fri Jan 7 01:09:53 2000
< st_mtime: Sat Jul 8 00:55:00 2000
< st_mtime: Sat Jul 8 00:55:00 2000
< st_mtime: Sat Jul 8 00:55:00 2000
< st_mtime: Fri Jan 7 01:09:53 2000
< st_mtime: Fri Jan 7 01:09:53 2000
< st_mtime: Sat Jul 8 00:55:00 2000
root@d370/>diff 332.txt 322.txt | grep st_mt | grep 2001
> st_mtime: Tue Nov 27 09:25:23 2001
> st_mtime: Tue Nov 27 09:25:23 2001
> st_mtime: Tue Nov 27 09:25:23 2001
> st_mtime: Tue Nov 27 09:25:23 2001
root@d370/>diff 332.txt 322.txt | grep st_mt | grep 2002
> st_mtime: Fri Apr 12 10:30:00 2002
> st_mtime: Fri Apr 12 10:30:00 2002
> st_mtime: Fri Apr 12 10:30:00 2002
> st_mtime: Fri Apr 12 10:30:00 2002
> st_mtime: Fri Apr 12 10:30:00 2002
Serach for file with the above time stamps it may point to a patch.
Paula
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-31-2002 10:56 AM
тАО10-31-2002 10:56 AM
Re: traceroute hangs from one specific server
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-04-2002 05:25 AM
тАО11-04-2002 05:25 AM
Re: traceroute hangs from one specific server
An update after the long weekend. I've come to believe that the problem is ARP-cache corruption. I've noticed that the traceroute takes forever when it looks like this :
oroot/sv00127#arp -a
10.101.1.100 (10.101.1.100) at 0:0:c:7:ac:65 ether
10.115.16.87 (10.115.16.87) -- no entry
Where 10.115.16.87 is an IP that according to me (and the networkadmins) can not possibly be in the ARP-cache (even though it seems to be there).
Whenever it looks normal, like this :
oroot/sv00127#arp -a
10.101.1.100 (10.101.1.100) at 0:0:c:7:ac:65 ether
sv00226.dolmen.be (10.101.5.2) at 0:10:83:f5:45:54 ether
sv00224.dolmen.be (10.101.3.5) at 0:2:a5:8c:11:e6 ether
sv00229.dolmen.be (10.101.5.3) at 0:50:8b:a1:67:40 ether
sv00248.dolmen.be (10.101.2.3) at 0:10:83:fc:b2:53 ether
traceroute works fine.
And now we come to the one difference between this system and the other systems, namely that patch PHNE_23456 (or above) can NOT be installed on this system. Reason for this is that it is the Control-M killerpatch (if you install it, older versions of Control-M will no longer work). Guess what, PHNE_23456 is the ARPA cummulative patch :-)
At the moment the arp-cache is fine (and I do not know how to cause the corruption). I'd still like some verification though. Does this sound plausible (and if yes, can anyone verify it) ?
Regards,
Tom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-04-2002 10:30 AM
тАО11-04-2002 10:30 AM
Re: traceroute hangs from one specific server
arp -d hostname
can be tried to remove the bad entry.
What I expect is happening to corrupt your arp table is there is a sometimes a problem
pinging your default router. On 11.0 this causes the route to be declared bad and it is removed from the route table.
If this happens your hp may try to ARP for the ethernet address and hope to receive a proxy arp reply which apparently doesn't happen. This puts an unresolved entry into your ARP cache. This should, I would expect, be removed at least after 5 minutes (from ndd -h):
arp_cleanup_interval:
The amount of time that non-permanent, resolved entries are permitted to remain in ARP's cache.[30000, 3600000]Default: 300000 (5 minutes). Or is that something that was broken before the patch you mentioned?
There is a parameter in ndd called: ip_ire_gw_probe which if you set it to 0 will stop testing the gateway. This may stop the proxy ARP business and keep your ARP table clean but I still wonder why the router is not responding. Is it perhaps at times overloaded with input traffic? I had a router with a 10 Half Duplex interface which was on a LAN where every one else was 100 Full Duplex. One process fired every 30 minutes and so overloaded the input to the router that nobody else on the LAN could get through to the router for anything else. Perhaps there is a similar process on the good box which is doing the same thing but because the process originates on the same box as the good traceroute, the local TCP/IP process forces them to share.
Ron
PS I still think it's a patch issue since a well written traceroute should just * out gracefully if it didn't get a reply.