that MIT paper again (Re: VeriSign's rapid DNS updates in .com/.net ) (longish)

Simon_Waters · July 23, 2004, 9:30pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Date: Fri, 23 Jul 2004 17:01:54 +0000
From: Paul Vixie <paul@vix.com>
Subject: that MIT paper again (Re: VeriSign's rapid DNS updates in

.com/.net )

wrt the mit paper on why small ttl's are harmless, i recommend that
y'all actually read it, the whole thing, plus some of the references,
rather than assuming that the abstract is well supported by the body.

http://nms.lcs.mit.edu/papers/dns-imw2001.html

I think most people are probably way too busy. I'll comment, and Paul
can tell me where I am wrong or incomplete

I'm slightly concerned that the authors think web traffic is the big
source of DNS, they may well be right (especially given one of the
authors is talking about his own network), but my quick glance at the
type of queries shouts to me that SMTP (and email related traffic,
RBL's, etc) generate a disproportionate amount of wide area DNS traffic
byte for byte of data. I would think this is one that is pretty easy to
settle for specific networks. In particular I see a lot of retries
generated by email servers for UBE and virus dross (in our case for upto
5 days), when human surfers have famously given up the domain as dead
after the first 8 seconds. Perhaps if most people preview HTML in
emails, surfing and email access to novel URI are one and the same.

They conclude that the great bulk of benefit from sharing a DNS cache is
obtained in the first 10 to 20 clients. Although they scale this only to
1000+ clients, maybe some NANOG members can comment if they have scaled
DNS caches much bigger than this, but I suspect a lot of the scaling
issues are driven by maintainance costs and reliability, since DNS
doesn't generate much WAN traffic in comparison to HTTP for most people
here (let's face it the root/tld owners are probably the only people who
even think about bandwidth of DNS traffic).

They conclude the TTL on A records isn't so crucial.

The abstract doesn't mention that the TTL on NS records is found to be
important for scalability of the DNS. Probably the main point Paul wants
us to note. Just because the DNS in insensitive to slight changes in A
record TTL doesn't mean TTL doesn't matter on other records.

The paper leaves a lot of hanging question about "poor performance",
the number of unanswered queries, and poor latency, which I'm sure can
be pinned down to the generally poor state of the DNS (both forward and
especially reverse), and a few bad applications.

The big difference between the places/times studied, suggests to me how
the DNS performs depends a lot on what mix of questions you ask it.

They suggest not passing on unqualified names would lose a lot of fluff
(me I still think big caches could zone transfer "." and save both
traffic and, more importantly for the end users, latency, but that goes
further than their proposal). Remember resolvers do various interesting
things with unqualified names depending who coded them and when.

The paper doesn't pass any judgement on types of lookups, but obviously
not all DNS lookups are equal from the end user perspective. For example
reverse DNS from HTTP server is typically done off the critical path
(asynchronously), where as the same reverse lookup may be in the
critical path for deciding whether to accept an email message (not that
most people regard email as that time critical). Be nice to do a study
classifying them along the lines of "DNS lookups you wait for", "DNS
lookups that slow things down", "DNS lookups that have to be done by
Friday for the weekly statistics".

Some *nix vendor(s?) should make sure loghost is in /etc/hosts or not in
/etc/syslog.conf by default by the sound of it

As regards rapid update by Verisign - bring it on - I'm always
embarassed to tell clients they may have to wait upto 12 hours for a new
website in this day and age. And any errors that gets made in the
initial setup takes too long to fix, I don't want to be setting up a
site 3PM Friday, and having to check it Monday morning to discover some
typo means it is Tuesday before it works, when in a sane world one TTL +
5 minutes is long enough.

I think relying on accurate DNS information to distinguish spammers from
genuine senders is at best shakey currently, the only people I can think
would suffer with making it easier and quicker to create new domains
would be people relying on something like SPF, but I think that just
reveals issues with SPF, and the design flaws of SPF shouldn't influence
how we should manage the DNS.

Valdis_Kletnieks · July 23, 2004, 11:09pm

Ahh.. but if SPF (complete with issues and design flaws) is widely deployed, we
may not have any choice regarding whether its issues and flaws dictate the DNS
management.

Remember that we've seen this before - RFC2052 didn't specify a '_', RFC2782
does. And we all know where BIND's "delegation-only" came from....

Daniel_Karrenberg · July 24, 2004, 8:19am

Sic!

And it is the *child* TTL that counts for most implementations.