1827681 Members
3428 Online
109967 Solutions
New Discussion

Re: DNS server failure

 
MarkW_1
Regular Advisor

DNS server failure

The primary of our 2 DNS servers hung and caused our Unix machines to not be able to perform DNS lookups. The secondary DNS server should have been contacted but was not. Here is our file setup.

# more /etc/nsswitch.conf

hosts: files [NOTFOUND=continue] dns nis
passwd: files [NOTFOUND=continue] nis
services: files [NOTFOUND=continue] nis

# more resolv.conf

domain uch.ad.pvt
nameserver 168.200.32.1
nameserver 168.200.28.1

Entry in /etc/hosts file:

168.200.32.1 uch1.uch.ad.pvt #ntp server

1 REPLY 1
Francis Noël
Regular Advisor

Re: DNS server failure

Hi Mark

You may be the victim of timeouts and retrys from the HP-UX DNS client/resolver.

Our site has Windows DNS servers. Please disregard if that is not your case.

I am currently investigating a similar issue and I might be able to provide clues. If your primary DNS server was hung it may have been acting in a way similar to what I am about to describe.

The HP-UX boxes will instantly query the secondary server if the primary cannot resolve the hostname, is down or simply off the network.

When our primary server reboots the HP resolvers swing over to the secondary DNS as expected BUT when the primary is booting up there is a ~50 second period where the Windows machine has an active TCP/IP stack but the DNS service is not fully initialized.

The windows host seems to be accepting DNS requests but is not answering them. This is bad as the HP-UX resolvers seems to expect either an error message from the DNS, an absence of DNS server at the configured address or, at the very least, a garbled response. It seems the Windows host is accepting the query but never returns an answer until the DNS service is fully initialized, if at all.

Since there is no connection error or service refusal the only mechanism left to the DNS client is retry+timeout. The default values specified in resolv.conf's man page tell me that the timeout value is 5 seconds and the retry count is 4, meaning that a single DNS lookup may take up to 20 seconds before going to the secondary server, if at all.

I have seen this on multiple occasions where the primary DNS server was rebooted. I'm waiting on the next occurance to determine if the query is forwarded to the secondary server after the timeout+retry mechanism has been exhausted or if the query is dropped altogether.

Hope this helps you move forward !