How common is lack of DNS server diversity?

Then it probably doesn't matter if you resolve their DNS, because you

won't be

getting to any of their services anyway.

Several folks have mentioned that they don't see a problem with dns failure
caused by an inability to reach all of the nameservers for a domain -
because presumably clients won't be able to reach any of the hosts in that
domain.

First, as we've seen demonstrated so clearly in the Microsoft case this
week, nameserver unreachability does not always imply unreachability of the
hosts in the domain.

Second (and even if all of the hosts are truly unreachable), there is one
somewhat important service that has a markedly different failure mode if
the server appears to not exist - email (smtp). Folks sending mail to a
domain that doesn't resolve usually get an immediate "delivery failure"
response. But those sending to a resolvable domain when the target mail
server is simply unreachable get their mail queued. It will typically get
retried by the mail system for a few days. Only after such a long outage
of the target will a delivery failure occur. (One somewhat ugly side
effect of the dns outage is that some mailing lists will remove the user
from their list when a delivery failure occurs. Not good to have to
explain this to your users.)

I lived through both situations (dns plus entire domain unreachable, and
the domain unreachable but dns still works); I much prefer the results with
a diverse dns setup.

Tony Rall

Hang on, I was only saying that for the limited case where DNS failure was caused
by route flapping of an entire Class A.

Don't tar me with the same brush as the folks who say the same about one's DNS
servers catching fire.

Second (and even if all of the hosts are truly unreachable), there
is one somewhat important service that has a markedly different
failure mode if the server appears to not exist - email (smtp).
Folks sending mail to a domain that doesn't resolve usually get an
immediate "delivery failure" response. But those sending to a
resolvable domain when the target mail server is simply
unreachable get their mail queued. It will typically get retried
by the mail system for a few days.

If "doesn't resolve" means "the target domain itself exists but none
of the authoritative name servers for the domain respond", those
mail systems doing as mentioned above should most probably be
replaced, because they do not properly and reasonably distinguish
between "hard" and "soft" errors when using the DNS.

If "doesn't resolve" on the other hand means "the target domain does
not exist in the DNS, and a reply to that effect is returned
(ultimately originating from one of the name servers of the relevant
parent domain)", then I agree; the mail should be bounced
immediately.

Quoting RFC 974 (which was the best quote I could find at the moment):

   Mailers are expected to do something reasonable in the face of an
   error. The behaviour for each type of error is not specified here,
   but implementors should note that different types of errors should
   probably be treated differently. For example, a response code of
   "non-existent domain" should probably cause the message to be
   returned to the sender as invalid, while a response code of "server
   failure" should probably cause the message to be retried later.

Regards,

- H�vard

[ On Saturday, January 27, 2001 at 23:53:54 (+0100), Havard Eidnes wrote: ]

Subject: Re: How common is lack of DNS server diversity?

If "doesn't resolve" means "the target domain itself exists but none
of the authoritative name servers for the domain respond", those
mail systems doing as mentioned above should most probably be
replaced, because they do not properly and reasonably distinguish
between "hard" and "soft" errors when using the DNS.

Yup, but don't tell me! :slight_smile:

Unfortunately "it works most of the time", or "it always works for us"
keeps such broken stuff in production.

Several folks have mentioned that they don't see a problem with dns
failure caused by an inability to reach all of the nameservers for a
domain - because presumably clients won't be able to reach any of the
hosts in that domain.

lesson: if you are multi-homed, don't let a dns-risky provider hold your
data, don't get two dns-risky providers, ...

then again, if you have enough clue to multi-home, there is hope that you
have enough clue to provide your own dns service and diversely.

randy

> Then it probably doesn't matter if you resolve their DNS,
> because you won't be getting to any of their services anyway.

Several folks have mentioned that they don't see a problem with
dns failure caused by an inability to reach all of the
nameservers for a domain - because presumably clients won't be
able to reach any of the hosts in that domain.

That's a wrong justification, not only due to the reasons you go on
to cite, but because detecting a failure to look up a name takes a
rather long time (your name server or resolver will typically have
to rely on a time-out), while reacting to an ICMP Host Unreachable
as a response to a TCP connection attempt is pretty quick (if your
network is indeed off the net, but your DNS service isn't).

This probably makes for easier debugging / better user reports, less
of a "world wide wait", faster mailing list deliveries and probably
also has other beneficial effects.

Regards,

- H�vard