Operating System - HP-UX
1833601 Members
4195 Online
110061 Solutions
New Discussion

Delay in inetd starting service

 
David Child_1
Honored Contributor

Delay in inetd starting service

Hello all,

One of my servers (HP-UX 11.11) is having problems running a service via inetd. This service had been working fine until a week ago.

During troubleshooting we found that when you telnet to that service's port the telnet session connects right away, then waits for 30 seconds, then starts the service;

Trying 10.160.64.60...
Connected to pdgds06.nationalcar.com (10.160.64.60).
Escape character is '^]'.
(30 seconds pass)
7003 15052 pid

I ran some traces with 'tusc' and I can see that there are several [socket -> connect -> send -> poll -> recvfrom] iterations (see attached). Then it changes to [socket -> *sendto* -> poll, socket -> sendto -> poll]. This is where the delay occurs (as you can see in the attached).

I'm not a network programmer (or programmer at all) so I'm having a hard time trying to find out what is happening at that point.

A couple other important points;

1) The client we are connecting from is in a DMZ and the server is in the internal network.

2) If I telnet to that port from a client within the internal network it works just fine (starts the service right up immediately). This can be the same subnet or a different subnet in the internal network.

3) Name resolution appears to be working okay. I can perform a forward and reverse lookup from the server to the client. Name resolution takes > 1 second. I have no access to the client, but they are using host file only and they do have the correct information in their /etc/hosts file.

Any ideas?

Thanks,
David

7 REPLIES 7
Steven E. Protter
Exalted Contributor

Re: Delay in inetd starting service

Shalom David,

inetd is starting a lot of services. One of them is having trouble and holding up the rest.

Try commenting out services and restarting inetd to find the one giving you trouble.

tail -f /var/adm/syslog/syslog.log for diagnostics during startup.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
David Child_1
Honored Contributor

Re: Delay in inetd starting service

Hello Steven,

Thanks for the suggestion, but unfortunately the server is in production and I can't disable any of the services, but on a positive note I monitored it for an hour and except for a few instances it was very quiet.

I then performed several traces and each time the results were the same. The initial telnet connection, followed by 30 seconds of delay, then the service gets called (see attached).

I also performed several additional traces from a client inside the internal network and each time the response was immediate (see the second part of the attachment).

For some reason after calling 'sendto' and sending data to the client, it polls and I'm assuming doesn't get any response. Then it try's again, increasing the poll time for each attempt.

Thanks,
David
David Child_1
Honored Contributor

Re: Delay in inetd starting service

I've done some additional troubleshooting and found that just plain old telnet exhibits the same symptoms;

telnet server
Trying...
Connected to server.
Escape character is '^]'.

(30 seconds pass)

Local flow control on
Telnet TERMINAL-SPEED option ON

login:

I know it seems like name resolution, but I can't find it. If I telnet back to the source server it is instant (doesn't matter if I use IP or hostname). I also tried 'nslookup', 'ping', and 'traceroute'. All are instant via hostname and IP;

# timex traceroute sourceserver
traceroute to sourceserver (155.xxx.xxx.xx), 30 hops max, 40 byte packets
1 10.160.xxx.1 (10.160.xxx.1) 0.334 ms 0.235 ms 0.229 ms
2 10.160.yyy.xxx (10.160.yyy.xxx) 4.234 ms 5.660 ms 4.727 ms
3 165.xxx.xxx.xx (165.xxx.xxx.xx) 3.636 ms 3.169 ms 3.432 ms
4 sourceserver (155.xxx.xxx.xx) 4.589 ms 4.715 ms 4.074 ms

real 0.05
user 0.00
sys 0.01

I don't have access to "sourceserver", but they aren't using name resolution at all (just IPs).

This only happens when connecting to my servers (4 out of 5) from this one "sourceserver". There is one server out of the 5 in my environment that works perfectly fine. I have compared patches, kernel parameters, resolv.conf, nsswitch.conf, /etc/hosts, routes, etc. They all match correctly.

Unfortunately there is no other server in the same DMZ as "sourceserver" that we can test from to compare.

Any other suggestions?

Thanks,
David
TTr
Honored Contributor

Re: Delay in inetd starting service

>Name resolution appears to be working okay. I can perform a forward and reverse lookup from the server to the client

This may still be a reverse name resolution timeout issue. I realize you are saying that the server is reverse-resolving the client but when the client connects to the server, what IP does it come in under?

Try adding a timeout in the name resolution in /etc/resolv.conf by appending the following
retrans 2000
retry 1

David Child_1
Honored Contributor

Re: Delay in inetd starting service

Okay, it gets a little weird now. As I already mentioned, I can 'ping hostname', 'traceroute hostname', etc. and it all works fine (and fast). I also used tcpdump and the IP address is correct in/out when connecting from the remote server.

I then decided to remove the server's entry from /etc/hosts. That fixed the problem. The client server can now connect instantly to my server.

Then I added an entry for my laptop into /etc/hosts. Now my laptop experiences delays.

Neither my laptop or the client are in DNS. So I took a system that does exist in DNS and added it to /etc/hosts. The delay occurs.

So apparently if there is an entry in /etc/hosts it causes delays when trying to connect from a remote system. If I try connecting to the remote system from my "bad" server (via IP or the name I added to /etc/hosts) it works just fine.

I checked for special characters or formatting issues and didn't find anything. I then copied an /etc/hosts from a good system and just modified the servers IP. That didn't fix the problem.

If it was just one server having this problem I could chalk it up to gremlins and see if a reboot fixes it, but it's 4 servers and the problem all happened at the same time. No system changes were done at that time either.

Very strange (or perhaps I'm missing the obvious).

David
Bill Hassell
Honored Contributor

Re: Delay in inetd starting service

telnet and 30-60-90 second delays almost ALWAYS mean resolver errors at the destination. 30 secs is the timeout when telnet connection protocol tries to match the incoming hostname with the IP address -- always both ways (for security). Any failure and the timeout for the DNS server is 30 seconds. If there are 3 DNS servers in /etc/resolv.conf, the delay extends to 90 sec.

The fix is easy: change /etc/nsswitch.conf

The hostname lookup rule should always be to look in /etc/hosts first, then fallback to DNS:

hosts: files [NOTFOUND=continue UNAVAIL=continue] dns
ipnodes: files [NOTFOUND=return] dns

Now you don't need to load up /etc/hosts with hundreds of addresses, just the problem ones where DNS fails one way or the other. A problem address is one where nslookup or nsquery (nsquery is preferred over nslookup due to more useful info). ALWAYS run the test both ways:

nsquery hosts www.hp.com
nsquery hosts 15.217.49.22

They both must succeed and they must match each other. Another MAJOR advantage to using /etc/hosts first relates to commercial network backup software. Because the high end packages can handle multiple systems simultaneously on the same tape, these programs lookup the host for *EVERY* file. I know, totally dumb since a simple cache in the code would solve this, but nevertheless, a DNS server can be severely overloaded with a network backup to fast tape drives. By finding all the hostnames involved in the backup inside /etc/hosts, the DNS server is unaffected.


Bill Hassell, sysadmin
David Child_1
Honored Contributor

Re: Delay in inetd starting service

Bill,

Thanks for the feedback. Your help is always appreciated. The information on telnet delays is good.

I think part of the problem is that I must have done a poor job explaining the problem.

1) My server was checking files first, then DNS. I don't have an 'ipnodes' entry though. I'm not familar with that so I'll need to do a little research.

2) There were only 10 or so entries in /etc/hosts. I'm not sure why the autosys team said they needed a local host entry (I found out they didn't), but it was there (and had been there for a long time).

3) I had already confirmed that nslookup resolved forward and reverse without delay (and information matched in both direction). (I will check out 'nsquery'). I could also 'traceroute', 'ping', 'ssh' to the remote server via hostname and IP without delay. From the command line name resolution appeared to work fine.

Now, the problem has been fixed. I found that by removing the client's entry from /etc/hosts on my server, the delay vanished and everything worked. The client now doesn't exist in /etc/hosts or DNS.

Once that was discovered one of my team mates found the following;

<>

It wasn't the exact same problem. In the case of that post he fixed it by adding an entry to /etc/hosts. In our case we had to remove it. The fix was the same though; restart inetd.

We're still not sure why the problem didn't pop up until April. The last changes on the servers were back in January.

Anyway, if you ever get any weird delay's while starting services via inetd; try restarting inetd before you waste a lot of time.

Thanks all,
David