1835626 Members
3454 Online
110081 Solutions
New Discussion

Strange DNS issue

 
Matt Hearn
Regular Advisor

Strange DNS issue

We're having a bizarre DNS issue.

About a year ago (before I joined the account), two new Sun DNS servers were installed in our data center two replace two old Sun boxes. (Everything else in our environment is HP-UX.) Recently, the customer decided to decommission the old Sun servers.

So I went through all our boxes and changed /etc/resolv.conf to reference only the NEW servers. No problems encountered, so we went ahead with the decommission.

On Saturday night, I took down the old Sun boxes. By Sunday morning, we discovered that a LIMS application on 4 HP-UX servers was not responding. All the processes were running, so it appeared to be a network connectivity problem.

SSH (v3.9.1) response was normal, but telnet (we use tcp_wrappers) was very, very unresponsive, so we guess that there's a reverse lookup issue and bring the old DNS boxes online. Sure enough, all the problems immediately are resolved.

But this doesn't make any sense, since all references to the old DNS boxes were removed from /etc/resolv.conf!!! I could see some possibility that something in the application was still hard-coded to the old boxes, but telnet should have responded normally.

While the old DNS boxes were down, I was able to connect and do nslookups using the new servers, looking up both IPs and hostnames.

I'm totally at a loss for what to do here. A colleague suggested taking down the old DNS boxes and rebooting everything else to see if that clears things up, but unfortunately we're contractually obligated to have the old DNS boxes decom'd by 5/15, and I'll only be able to try and fix this once. If this doesn't work, we're going to have to leave the old DNS boxes on until we figure out what's wrong, which means financial penalties.

I'm really hoping that somebody on the forum has run into similar problems before and resolved them pretty easily.

Let me know what other information might be helpful (copies of /etc/resolv.conf, syslogs, etc.)

Thank you!!!
34 REPLIES 34
Steven E. Protter
Exalted Contributor

Re: Strange DNS issue

Could be cached DNS information.

It also could be that a server pointed to in /etc/resolv.conf points to the old servers.

You might be able to pick this up with tcpdump or ethereal packet sniffer.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: Strange DNS issue

Forgot to mention. it could be a router configuration or firewall re-routing port 53 traffic.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Matt Hearn
Regular Advisor

Re: Strange DNS issue

About cached info: do you mean that something on our HP-UX box isn't re-reading resolv.conf and is going by something in memory?

I took "DNS cache" to mean that something on the server remembers the results of a DNS lookup it did in the past. Which SHOULDN'T be a problem; we haven't changed the IP address of any servers, we just removed a couple boxes that aren't even listed in resolv.conf anymore.

What are the odds that a simple reboot of everything will resolve this?
Steven E. Protter
Exalted Contributor

Re: Strange DNS issue

If its not a rouer and you are SURE that the old sun boxes or any windows dns servers are fixed then the chances are quite high a complete reboot will work.

/sbin/init.d/named stop
/sbin/init.d/named start

might also work.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
harry d brown jr
Honored Contributor

Re: Strange DNS issue

Can you post the following files from your HP-ux boxes:

/etc/resolv.conf
/etc/nsswitch.conf

Check to see if the following process eis running on your HP-ux boxes:

named

If so, then post the file /etc/named.conf.

Additionally, check your /etc/hosts file for 127.0.0.1. Which should resolv localhost and localhost.DOMAINNAME

Also, post a typical routing table:

netstat -rvn

Also, what boxes are your routers/switches resolving to? Could they be using the OLD Sun boxes?

Check the NEW Sun boxes to make sure they aren't treating the old Sun boxes as masters or slaves.

What kind of software are you using on the Sun boxes to do DNS? Bind is normal, but you could be running a third party like Nortel's NetID.

Your HP-ux Boxes will NOT do DNS caching, unless you set them up to behave like that, which requires using the process "named" and /etc/named.conf and zone files in /etc/named.data directory. I highly doubt you are. Windoze boxes do dns caching out of the gate, so I'll ignore them, besides most users reboot them often enough that the cache files would be cleared.

On your HP-ux boxes do this and post the results:

what `which named`

which should return something like this:

[root@vpart1 /]# what `which named`
/usr/sbin/named:
$Revision: 2.0 $ Sat Sep 21 11:37:57 GMT 2002
named 9.2.0 Sat Sep 21 11:37:57 GMT 2002
Copyright (C) 1995-1998 Eric Young.All rights reserved.
[root@vpart1 /]#

If you are running 9.2, then use the "host" and "dig" commands, not "nslookup"

live free or die
harry d brown jr
Live Free or Die
Matt Hearn
Regular Advisor

Re: Strange DNS issue

resolv.conf:
domain himont.com
nameserver 10.90.0.47
nameserver 10.90.0.48

Those nameservers are both the NEW DNS boxes.

nsswitch.conf:
hosts: files [NOTFOUND=continue UNAVAIL=continue TRYAGAIN=return] dns [NOTFOUND=return UNAVAIL=continue TRYAGAIN=continue]

We don't use named on our boxes.

I check all of the sun DNS boxes, old and new, and nobody seems to be running bind. I unfortunately didn't set those servers up, and I don't have any control over the DNS application, so I don't have any idea what's going on there.

I don't even know how to stop DNS on the old servers for testing.

I also checked resolv.conf on the NEW DNS servers, and they make no mention of the old ones. I don't know how to check the DNS application to see what it might be doing. I have a call in to the guy that handles that end, but he's in a timezone that's 6 hours disparate from mine, so who knows when I'll hear back.
harry d brown jr
Honored Contributor

Re: Strange DNS issue


When you get a chance to ask the DNS admin, ask him what software he uses to maintain DNS.

live free or die
harry d brown jr
Live Free or Die
Matt Hearn
Regular Advisor

Re: Strange DNS issue

Looks like the SUN boxes are all running named. I'm looking at named.conf, but it's mostly gibberish. It did tell me where named.log was, though there's nothing in there that seems to indicate a problem.
harry d brown jr
Honored Contributor

Re: Strange DNS issue

Can you post the /etc/named.conf from the new Sun DNS server?

live free or die
harry d brown jr
Live Free or Die
Matt Hearn
Regular Advisor

Re: Strange DNS issue


Sorry this is so long:
Matt Hearn
Regular Advisor

Re: Strange DNS issue

named.conf is attached to my previous reply, if that's not obvious. Thanks!
Bill Hassell
Honored Contributor

Re: Strange DNS issue

In all of this, you are assuming that the old DNS server data identical to the new servers. My guess would be that it was not 100% ported. Most likely, the new DNS servers are not configured the same way for reverse lookups, and may in fact, not even know their own name. Use nslookup on HP-UX to query specific nameservers. Not many people know this but you can bypass all the nameservers in resolv.conf by placing a specific nameserver on the command line:

nslookup some_host mydns_server
or
nslookup snoopy 12.34.56.78

where 12.34.56.78 is an old or a new DNS server. Then see what IP snoopy has and now ask for a reverse lookup:

nslookup 87.65.43.21 12.34.56.78

And by the way, did you see an error message in the first of nslookup, something about:

*** Can't find address for server: ...

That's a dead giveaway that the DNS maintainer forgot it's own name. DNS is a VERY serious part of network security and the greatest care should be made to resolve every issue. nslookup is trying to tell you that the DNS server isn't 100% loaded with the required data. You can get around this issue by putting a dummy entry into /etc/hosts for the missing DNS server names:

12.34.56.78 dumbdns1
12.34.56.77 dumbdns2
etc

and if this helps, then you can safely assume that there are many other missing records (until proven otherwise). Ignore comments like "they should be the same"


Bill Hassell, sysadmin
Matt Hearn
Regular Advisor

Re: Strange DNS issue

Bill, you appear to have hit on something:

bassvsndcdns1# nslookup
Default Server: bassvsndcdns1.basell.com
Address: 10.90.0.47

> nslookup 10.90.0.47
Server: [10.90.0.47]
Address: 10.90.0.47

*** 10.90.0.47 can't find nslookup: Non-existent host/domain

Unfortunately this doesn't help explain why when the OLD servers are turned off, the systems have issues. Our HP-UX boxes are already pointed to bassvsndcdns1 and bassvsndcdns2 for name resolution, and have no entries to wpdns1 and wpdns2 (the old servers) in resolv.conf.

Still, I'm going to have a chat with the DNS people and try and establish whatever they did that's clearly screwed up.

Thanks!
Patrick Wallek
Honored Contributor

Re: Strange DNS issue

You didn't do correctly in your last response.

You did

# nslookup
Which took you to the nslookup prompt (>). You then did:

> nslookup 10.90.0.47
Server: [10.90.0.47]
Address: 10.90.0.47

At the > prompt you can just type a name or IP address. In this case you got an error because it was thinking nslookup was a name.

At the # prompt you need to do something like:

# nslookup 10.90.0.47 10.90.0.47

That it will attempt to lookup the name for 10.90.0.47 from the DNS server 10.90.0.47.
Matt Hearn
Regular Advisor

Re: Strange DNS issue

::sigh:: I've been catching myself doing that all day. It's been a long week, and it's only Monday!

Anyway, if I operate nslookup in a non-idiotic way, the server does know its own name and IP. So that's not the problem after all.
Todd Whitcher
Esteemed Contributor

Re: Strange DNS issue

Hi Matt,

I think everyone is on the right track, there appears to be some mis-configuration on the "NEW" nameservers that are replacing the "OLD" nameservers. Since telnet is doing reverse lookups I would suspect the in-addr.arpa (PTR) records are the problem. Telnet does a reverse lookup when a connection is being established to verify the remote client, the timeout for this is ~60 seconds.

What I would suggest is to use dig to check the "NEW" nameservers for some of your clients IP's that experienced the hangs.


dig @new_name_server ip_addr -x PTR

example:


dig @www.atl.hp.com -x 15.17.186.112 PTR

; <<>> DiG named 9.2.0 <<>> @www.atl.hp.com -x 15.17.186.112 PTR
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40803
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 4

;; QUESTION SECTION:
;112.186.17.15.in-addr.arpa. IN PTR

;; ANSWER SECTION:
112.186.17.15.in-addr.arpa. 7156 IN PTR gator.atl.hp.com.

;; AUTHORITY SECTION:
186.17.15.in-addr.arpa. 7156 IN NS pal-delegate.hp.com.
186.17.15.in-addr.arpa. 7156 IN NS nideast.americas.hp.net.



Check the answer section and the Authority sections.

If you can reproduce your issue you can run a tcpdump on the SUN servers and a nettl trace on the HPUX clients and follow the resolution to see where the problem occurs.

You can also ask the DNS administrators to enable query logging and test.

Hope that helps,

Todd
Todd Whitcher
Esteemed Contributor

Re: Strange DNS issue

FYI, dig comes w/ bind 9.2 on HPUX. I'm not sure what version it comes w/ for Solaris's version of BIND.

You can download a copy of dig from the web also.

Matt Hearn
Regular Advisor

Re: Strange DNS issue

I installe dig on a server, and here's what I get when I do some querying.
Necessary legend:
bassvsndcdns1 - one of our NEW dns boxes
wpdns1 - one of our OLD dns boxes
banh2lme - one of the servers that had the slow-down when wpdns1&2 were down
10.90.0.62 - another server that had a experienced slow-down

banh2lme:/root> dig @bassvsndcdns1 10.90.0.62 -x PTR

; <<>> DiG 2.0 <<>> @bassvsndcdns1 10.90.0.62 -x
;; ->>HEADER<<- opcode: QUERY , status: NXDOMAIN, id: 10
;; flags: qr rd ra ; Ques: 1, Ans: 0, Auth: 1, Addit: 0
;; QUESTIONS:
;; PTR.in-addr.arpa, type = ANY, class = IN

;; AUTHORITY RECORDS:
in-addr.arpa. 9557 SOA A.ROOT-SERVERS.NET. bind.ARIN.NET. (
2005051004 ;serial
1800 ;refresh
900 ;retry
691200 ;expire
10800 ) ;minim


;; Sent 1 pkts, answer found in time: 0 msec
;; FROM: banh2lme to SERVER: bassvsndcdns1 10.90.0.47
;; WHEN: Tue May 10 10:04:20 2005
;; MSG SIZE sent: 34 rcvd: 98

banh2lme:/root> dig @wpdns1 10.90.0.62 -x PTR

; <<>> DiG 2.0 <<>> @wpdns1 10.90.0.62 -x
;; ->>HEADER<<- opcode: QUERY , status: NXDOMAIN, id: 10
;; flags: qr rd ra ; Ques: 1, Ans: 0, Auth: 1, Addit: 0
;; QUESTIONS:
;; PTR.in-addr.arpa, type = ANY, class = IN

;; AUTHORITY RECORDS:
in-addr.arpa. 9549 SOA A.ROOT-SERVERS.NET. bind.ARIN.NET. (
2005051004 ;serial
1800 ;refresh
900 ;retry
691200 ;expire
10800 ) ;minim


;; Sent 1 pkts, answer found in time: 0 msec
;; FROM: banh2lme to SERVER: wpdns1 153.47.3.18
;; WHEN: Tue May 10 10:04:29 2005
;; MSG SIZE sent: 34 rcvd: 98
Todd Whitcher
Esteemed Contributor

Re: Strange DNS issue

There are no PTR records for that address on any of your DNS servers. If those servers are supposed to be authoritative for that hostname/ip address then you need to have those records added.
It Does not explain exactly why brining up that DNS server solved things, network traces and debug would help w/ that.

However it's a problem that should be addressed on the new servers. UNIX does a lot of reverse lookups so you need PTR records for IP addresses to be able to resolve that. You should verify the A records exist for that hostname as a sanity check.

ex
root@florida> dig @www.atl.hp.com gator.atl.hp.com A

; <<>> DiG named 9.2.0 <<>> @www.atl.hp.com gator.atl.hp.com A
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24074
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 4

;; QUESTION SECTION:
;gator.atl.hp.com. IN A

;; ANSWER SECTION:
gator.atl.hp.com. 6732 IN A 15.17.186.112



rick jones
Honored Contributor

Re: Strange DNS issue

FWIW, indeed, where to go to find name information may be cached in a process context under HP-UX - changes to /etc/resolv.conf and/or /etc/nsswitch.conf may not be automagically detected. I think that may be addressed in later revs of HP-UX, but it could just be bitrot in my dimm memory.

You could always take a tcpdump trace on the HP-UX box looking for "port 53" traffic and see just where it is sending all its DNS requests. You could also pick a few of your long-lived (as in up since before the switchover) processes and run tusc against them - you will see the bind/connect calls that are indicative of a DNS request and that info (assuming you use a verbose tusc, IIRC) will show you which name server that processes is using.
there is no rest for the wicked yet the virtuous have no pillows
Ron Kinner
Honored Contributor

Re: Strange DNS issue

nslookup
set debug


then type in the name and see what you get. Then try
set norecurse
and repeat.

Sometimes that will tell you what is really going on.


Ron
Matt Hearn
Regular Advisor

Re: Strange DNS issue

When I try to "set" anything I get errors:

wpbkup1:/home/root # nslookup
Using /etc/hosts on: wpbkup1

> set debug
*** Invalid option: debug
> set norecurse
*** Invalid option: norecurse
> set
Using /etc/hosts on: wpbkup1

looking up FILES
Trying DNS
*** bassvsndcdns1.basell.com can't find set: Non-existent domain
> exit
Bill Hassell
Honored Contributor

Re: Strange DNS issue

To run nslookup with interactive commands, use:

nslookup -

The man page says it's optional to get to interactive mode but the - changes the startup prompt indicating that interactive mode now works. Something like this:

# nslookup -
Specifying a nameserver has overridden the switch policy order.
The reset command will reinstate the order specified by the switch policy.
Default Name Server: smipc.net
Address: 208.236.200.4

> set debug
> hp.com
Name Server: smipc.net
Address: 208.236.200.4

Trying DNS
;; res_mkquery(0, hp.com, 1, 1)
------------
Got answer:
HEADER:
opcode = QUERY, id = 34062, rcode = NOERROR
header flags: response, auth. answer, want recursion, recursion avail.
questions = 1, answers = 4, authority records = 6, additional = 6

QUESTIONS:
hp.com, type = A, class = IN
ANSWERS:
-> hp.com
internet address = 192.151.53.86
ttl = 600 (10M)
-> hp.com ...

AUTHORITY RECORDS:
-> hp.com
nameserver = ap1.hp.com
ttl = 9814 (2h43m34s)
-> hp.com
nameserver = eu1.hp.com
ttl = 9814 (2h43m34s)
-> hp.com ...

ADDITIONAL RECORDS:
-> ap1.hp.com
internet address = 15.211.128.50
ttl = 164032 (1d21h33m52s)
-> eu1.hp.com ...


------------
Name: hp.com
Addresses: 192.151.53.86, 192.6.234.8, 192.6.234.9, 161.114.22.105


Bill Hassell, sysadmin
harry d brown jr
Honored Contributor

Re: Strange DNS issue


What does this return?

what `which named` `which named`

also, Bill's last post indicated a DASH after nslookup

nslookup -

live free or die
harry d brown jr
Live Free or Die