DNS queries for . IN A return rcode 2 SERVFAIL from windows DNS recursing resolvers

Hey all,

This must be old news for everyone else. While looking at a dns monitor on a load balancer that defaulted to . A queries to check liveliness on DNS resolvers, it became quite clear that windows 2000/2003 DNS server appears to return rcode=2 for queries looking for an A record for the root. The resolvers appear to work properly in all other regards.

So the monitors were switched to localhost. A

(Is this a bad idea?)

A little testing later and the results for . A are:

Windows NT 4, ancount=0, authority=1, rcode=0
Windows 2000, rcode=2
Windows 2003, rcode=2
bind, ancount=0, authority=1, rcode=0

To my (inexpert) eyes that doesnt seem quite right.

I cant seem to find any online information regarding this difference of behavior.

Enlightenment appreciated.

Joe

Here is the output.

fpdns -c -s 64.95.32.34 && dig @64.95.32.34 . a
64.95.32.34 Microsoft Windows DNS NT4

; <<>> DiG 9.6.1-P2 <<>> @64.95.32.34 . a
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35180
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;. IN A

;; AUTHORITY SECTION:
. 86400 IN SOA A.ROOT-SERVERS.NET. NSTLD.VERISIGN-GRS.COM. 2010010500 1800 900 604800 86400

;; Query time: 114 msec
;; SERVER: 64.95.32.34#53(64.95.32.34)
;; WHEN: Tue Jan 5 07:40:33 2010
;; MSG SIZE rcvd: 92

fpdns -c -s 216.222.144.16 && dig @216.222.144.16 . a
216.222.144.16 ISC BIND 9.2.3rc1 -- 9.6.1-P1 [recursion enabled] id: "9.5.1-P2"

; <<>> DiG 9.6.1-P2 <<>> @216.222.144.16 . a
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49220
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;. IN A

;; AUTHORITY SECTION:
. 2314 IN SOA A.ROOT-SERVERS.NET. NSTLD.VERISIGN-GRS.COM. 2010010500 1800 900 604800 86400

;; Query time: 38 msec
;; SERVER: 216.222.144.16#53(216.222.144.16)
;; WHEN: Tue Jan 5 07:42:08 2010
;; MSG SIZE rcvd: 92

fpdns -c -s joe.jmaimon.com && dig @joe.jmaimon.com . a
216.222.150.100 ISC BIND 9.2.3rc1 -- 9.6.1-P1 [recursion enabled] id: "9.5.0-P2-W2"

; <<>> DiG 9.6.1-P2 <<>> @joe.jmaimon.com . a
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39125
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;. IN A

;; AUTHORITY SECTION:
. 10800 IN SOA A.ROOT-SERVERS.NET. NSTLD.VERISIGN-GRS.COM. 2010010500 1800 900 604800 86400

;; Query time: 40 msec
;; SERVER: 216.222.150.100#53(216.222.150.100)
;; WHEN: Tue Jan 5 07:57:52 2010
;; MSG SIZE rcvd: 92

  fpdns -c -s 64.95.32.130 && dig @64.95.32.130 . a
64.95.32.130 Microsoft Windows DNS 2000

; <<>> DiG 9.6.1-P2 <<>> @64.95.32.130 . a
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 30535
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;. IN A

;; Query time: 35 msec
;; SERVER: 64.95.32.130#53(64.95.32.130)
;; WHEN: Tue Jan 5 07:43:51 2010
;; MSG SIZE rcvd: 17

  fpdns -c -s 72.26.241.205 && dig @72.26.241.205 . a
72.26.241.205 No match found

; <<>> DiG 9.6.1-P1 <<>> @72.26.241.205 . a
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 12807
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;. IN A

;; Query time: 0 msec
;; SERVER: 72.26.241.205#53(72.26.241.205)
;; WHEN: Tue Jan 5 08:13:06 2010
;; MSG SIZE rcvd: 17

Joe Maimon <jmaimon@ttec.com> writes:

Hey all,

This must be old news for everyone else. While looking at a dns monitor
on a load balancer that defaulted to . A queries to check liveliness on
DNS resolvers, it became quite clear that windows 2000/2003 DNS server
appears to return rcode=2 for queries looking for an A record for the
root. The resolvers appear to work properly in all other regards.

well, there is no A RR for the root domain. RCODE=2 is still an error,
you should receive RCODE=0 ANCOUNT=0 for an unused RR type. but many
resolvers get confused when the root domain is the QNAME, so let's assume
that you're using one of those.

So the monitors were switched to localhost. A

(Is this a bad idea?)

probably. there is no "localhost" in the root zone. this name is a TCP/IP
stack convention, not a standard. for health monitoring purposes you should
probably choose one of your own local names, since there's almost certainly
no local intelligence in your resolver about them. that means to look up
one of your own names the resolver probably has to iterate downward from the
root zone to the top level and all the way down to your authority nameservers.
(the problem here is, you may be testing more than you intend, and a failure
in your own authority server or in the delegation path to it would look the
same as an IP path failure or a resolver problem.)

A little testing later and the results for . A are:

Windows NT 4, ancount=0, authority=1, rcode=0
Windows 2000, rcode=2
Windows 2003, rcode=2
bind, ancount=0, authority=1, rcode=0

To my (inexpert) eyes that doesnt seem quite right.

probably resolver bugs, either in those TCP/IP stacks or in the "recursive
nameserver" they are using. (is the same recursive nameserver used in all
four tests?)

I cant seem to find any online information regarding this difference of
behavior.

Enlightenment appreciated.

i suggest re-asking this over on dns-operations@lists.dns-oarc.net, since it
a bit deep in the DNS bits for a general purpose list like NANOG.