What DNS Is Not

Thought-provoking article by Paul Vixie:

http://queue.acm.org/detail.cfm?id=1647302

Alex Balashov wrote:

Thought-provoking article by Paul Vixie:

http://queue.acm.org/detail.cfm?id=1647302

I doubt Henry Ford would appreciate the Mustang.

-Dave

Dave Temkin wrote:

Alex Balashov wrote:

Thought-provoking article by Paul Vixie:

http://queue.acm.org/detail.cfm?id=1647302

I doubt Henry Ford would appreciate the Mustang.

I don't think that is a very accurate analogy, and in any case, the argument is not that we should immediately cease at once all the things we do with DNS today.

DNS is one of the more centralised systems of the Internet at large; it works because of its reliance on intermediate caching and end-to-end accuracy.

It seems to me the claim is more that DNS was not designed to handle them and that if this is what we want to do, perhaps something should supplant DNS, or, alternate methods should be used.

For example, perhaps in the case of CDNs geographic optimisation should be in the province of routing (e.g. anycast) and not DNS?

-- Alex

Alex Balashov wrote:

For example, perhaps in the case of CDNs geographic optimisation should be in the province of routing (e.g. anycast) and not DNS?

-- Alex

In most cases it already is. He completely fails to address the concept of Anycast DNS and assumes people are using statically mapped resolvers.

He also assumes that DNS is some great expense and that by not allowing tons of caching we're taking money out of peoples' wallets. This is just not true with the exception of very few companies whose job it is to answer DNS requests.

-Dave

This myth (that Paul mentions, not to suggest Dave T's comment is a myth) was debunked years ago:

"DNS Performance and the Effectiveness of Caching"
Jaeyeon Jung, Emil Sit, Hari Balakrishnan, and Robert Morris
http://pdos.csail.mit.edu/papers/dns:ton.pdf

Basically: Caching of NS records is important, particularly higher up in the hierarchy. Caching of A records is drastically less important - and, not mentioned in the article, the cost imposed by low-TTL A records is shared mostly by the client and the DNS provider, not some third party infrastructure.

From the paper:

"Our trace-driven simulations yield two findings. First, reducing the TTLs of A records to as low as a few hundred seconds has little adverse effect on hit rates. Second, little benefit is obtained from sharing a forwarding DNS cache among more than 10 or 20 clients. This is consistent with the heavy-tailed nature of access to names. This suggests that the performance of DNS is not as dependent on aggressive caching as is commonly believed, and that the widespread use of dynamic low-TTL A-record bindings should not degrade DNS performance. The reasons for the scalability of DNS are due less to the hierarchical design of its name space or good A-record caching than seems to be widely believed; rather, the cacheability of NS records efficiently partition the name space and avoid overloading any single name server in the Internet."

   -Dave

DNS is NOT always defined by Paul... :slight_smile:

--bill

And of course in many cases these are the same people who are benefiting
significantly by the geo-aware (and sometimes, network-aware) CDN's that
this type of DNS service provides.

  Scott.

Hi, Bill -

The paper is worth reading.

"The paper also presents the results of trace-driven simulations that explore the effect of varying TTLs and varying degrees of cache sharing on DNS cache hit rates. "

emphasis on *trace-driven*. Now, you can argue whether or not their traces are representative (whatever that means) -- they used client DNS and TCP connection traces from MIT and KAIST, so it definitely has a .edu bias, iff there is a bias in DNS traffic for universities vs. "the real world", but to the extent that their traces represent what other groups of users might see, their evaluation seems accurate.

   -Dave

I don't know why Paul is so concerned, just think how many F root mirrors
it helps him sell to unsuspecting saps. The Henry Ford analogy was amazingly
apt, imagine 'ol Henry coming back and claiming that automatic transmissions
were a misuse of the automobile.

Drive Slow ('cause someone left the door open at the old folks home)

I'm not debating the traces - I wonder about the simulation
  model. (and yes, I've read the paper)

--bill

I'm happy to chat about this offline if it bores people, but I'm curious what you're wondering about.

The method was pretty simple:

  - Record the TCP SYN/FIN packets and the DNS packets
  - For every SYN, figure out what name the computer had resolved to open a connection to this IP address
  - From the TTL of the DNS, figure out whether finding that binding would have required a DNS lookup

There are some obvious potential sources of error - most particularly, name-based HTTP virtual hosting may break some of the assumptions in this - but I'd guess that with a somewhat smaller trace, not too much error is introduced by clients going to different name-based vhosts on the same IP address within a small amount of time. There are certainly some, but I'd be surprised if it was more than a %age of the accesses. Are there other methodological concerns?

I'd also point out for this discussion two studies that looked at how accurately one can geo-map clients based on the IP address of their chosen DNS resolver. There are obviously potential pitfalls here (e.g., someone who travels and still uses their "home" resolver). In 2002:

Z. M. Mao, C. D. Cranor, F. Douglis, and M. Rabinovich. A Precise and Efficient Evaluation of the Proximity between Web Clients and their Local DNS Servers. In Proc. USENIX Annual Technical Conference, Berkeley, CA, June 2002.

Bottom line: It's ok but not great.

"We con- clude that DNS is good for very coarse-grained server selection, since 64% of the associations belong to the same Autonomous System. DNS is less useful for finer- grained server selection, since only 16% of the client and local DNS associations are in the same network-aware cluster [13] (based on BGP routing information from a wide set of routers)"

We did a wardriving study in Pittsburgh recently where we found that, of the access points we could connect to, 99% of them used their ISP's provided DNS server. Pretty good if your target is residential users:

http://www.cs.cmu.edu/~dga/papers/han-imc2008-abstract.html

(it's a small part of the paper in section 4.3).

   -Dave

Well my first answer to that would be that GSLB scales down a lot further than anycast.

And my first question would be what would the load on the global routing system if a couple of thousand (say) extra sites started using anycast for their content?

Each would have their own AS (perhaps reused from elsewher in the company) and a small network or two. Routes would be added and withdrawn regularly and various "stupid BGP tricks" attempted with communitees and prefixes.

I heard some anti-spam people use DNS to distribute big databases of information. I bet Vixie would have nasty things to say to the guy who first thought that up.

Are you asking what the impact would be of a couple of thousand extra routes in the current full table of ~250,000? That sounds like noise to me (the number, not your question :slight_smile:

Joe

DNS is NOT always defined by Paul... :slight_smile:

I agree Bill, but Paul is right on the money about how the DNS is being
misused and abused to create more smoke and mirrors in the domain
name biz.

I really find annoying that some ISPs (several large ones among them) are
still tampering with the DNS responses just to put few more coins on
their coffers from click through advertising.

What I'm really afraid is that all the buzz and $$ from the domain biz
will create strong resistance to any efforts to develop a real directory
service or better better scheme for resource naming and location.

BTW simulations != real world.

Cheers
Jorge

Given that paper is 7 years old and the Internet has changed a bit since 2002 (and the DNS looks to change somewhat drastically in the relatively near future) it might be dangerous to rely too much on their results. This might be an interesting area of additional research...

Regards,
-drc

Well, the marketing folks have sure taken advantage of it. It would be nice
to see the technology folks... not just lie there and take it.

- - ferg

Alex Balashov wrote:

Thought-provoking article by Paul Vixie:

http://queue.acm.org/detail.cfm?id=1647302

Bah, many of the CDN's I've dealt with don't seed geographical responses based on DNS, but rather use many out of band methods for determining what response they will hand out. The primary reason for short cutting cache is to limit failures in case the system a requestor is going to goes down.

And different CDN's behave differently, depending on how they deliver content, support provider interconnects, etc. I'd hardly call many of them DNS lies, as they do resolve you to the appropriate IP, and if that IP disappears, try and quickly get you to another appropriate IP.

The rest of the article was informative,though.

Jack

i loved the henry ford analogy -- but i think henry ford would have said that
the automatic transmission was a huge step forward since he wanted everybody
to have a car. i can't think of anything that's happened in the automobile
market that henry ford wouldn't've wished he'd thought of.

i knew that the "incoherent DNS" market would rise up on its hind legs and
say all kinds of things in its defense against the ACM Queue article, and i'm
not going to engage with every such speaker.

there three more-specific replies below.

Dave Temkin <davet1@gmail.com> writes:

Alex Balashov wrote:

For example, perhaps in the case of CDNs geographic optimisation should
be in the province of routing (e.g. anycast) and not DNS?

In most cases it already is. He completely fails to address the concept
of Anycast DNS and assumes people are using statically mapped resolvers.

"anycast DNS" appears to mean different things to different people. i didn't
mention it because to me anycast dns is a bgp level construct whereby the
same (coherent) answer is available from many servers having the same IP
address but not actually being the same server. see for example how several
root name servers are distributed. <http://www.root-servers.org/>. if you
are using "anycast DNS" to mean carefully crafted (noncoherent) responses
from a similarly distributed/advertised set of servers, then i did address
your topic in the ACM Queue article.

David Andersen <dga@cs.cmu.edu> writes:

This myth ... was debunked years ago:

"DNS Performance and the Effectiveness of Caching"
Jaeyeon Jung, Emil Sit, Hari Balakrishnan, and Robert Morris
http://pdos.csail.mit.edu/papers/dns:ton.pdf

my reason for completely dismissing that paper at the time it came out was
that it tried to predict the system level impact of DNS caching while only
looking at the resolver side and only from one client population having a
small and uniform user base. show me a "trace driven simulation" of the
whole system, that takes into account significant authority servers (which
would include root, tld, and amazon and google) as well as significant
caching servers (which would not include MIT's or any university's but
which would definitely include comcast's and cox's and att's), and i'll
read it with high hopes. note that ISC SIE (see http://sie.isc.org/ may
yet grow into a possible data source for this kind of study, which is one
of the reasons we created it.)

Simon Lyall <simon@darkmere.gen.nz> writes:

I heard some anti-spam people use DNS to distribute big databases of
information. I bet Vixie would have nasty things to say to the guy who
first thought that up.

someone made this same comment in the slashdot thread. my response there
and here is: the MAPS RBL has always delivered coherent responses where the
answer is an expressed fact, not kerned in any way based on the identity of
the querier. perhaps my language in the ACM Queue article was imprecise
("delivering facts rather than policy") and i should have stuck with the
longer formulation ("incoherent responses crafted based on the identity of
the querier rather than on the authoritative data").

Hi, Paul - I share your dislike of DNS services that break the DNS
model for profit in ways that break applications.
For instance, returning the IP address of your company's port-80 web
server instead of NXDOMAIN
not only breaks non-port-80-http applications, it also breaks the
behaviour that browsers such as
IE and Firefox expect, which is that if a domain isn't found, they'll
do something that the user chooses,
such as sending another query to the user's favorite search engine.

There is one special case for which I don't mind having DNS servers
lie about query results,
which is the phishing/malware protection service. In that case, the
DNS response is redirecting you to
the IP address of a server that'll tell you
       "You really didn't want to visit PayPa11.com - it's a fake" or
       "You really didn't want to visit
dgfdsgsdfgdfgsdfgsfd.example.ru - it's malware".
It's technically broken, but you really _didn't_ want to go there anyway.
It's a bit friendlier to administrators and security people if the
response page gives you the
IP address that the query would have otherwise returned, though
obviously you don't want it to be
a clickable hyperlink.

However, I disagree with your objections to CDN, and load balancers in
general - returning the
address of the server that example.com thinks will give you the best
performance is reasonable.
(I'll leave the question of whether DNS queries are any good at
determining that to the vendors.)
Maintaining a cachable ns.example.com record in the process is
friendly to everybody;
maintaining cachable A records is less important.
If reality is changing rapidly, then the directory that points to the
reality can reasonably change also.