Why can cached DNS replies be overwritten?

Jay_Ashworth · August 11, 2008, 3:39pm

Correct me if I'm wrong, Leo, but your assertion turns on the fact that
the server will accept an overwriting cache entry for something it
already has cacheed, does it not?

Do djb and Power in fact do that?

If they don't, the window of opportunity to poison something like .com
is limited to the period between when that entry expires from the local
server's cache and the next time it hears a reply -- which will be the
time after that expiry when someone next requests a .com name; IE
almost immediately, no?

Everyone seems to continue asking "why can poisoning overwrite already
cached answer" and no one seems to be answering, and, unless I'm a
moron (which is not impossible), that's the crux of this issue.

Cheers,
-- jra

Tony_Finch · August 11, 2008, 3:58pm

Add me to the list of baffled observers. As far as I can tell this
vulnerability to poisoning is mostly forbidden by the trustworthiness
ranking in RFC 2181.

Tony.

Leo_Bicknell1 · August 11, 2008, 3:59pm

In a message written on Mon, Aug 11, 2008 at 11:39:25AM -0400, Jay R. Ashworth wrote:

Everyone seems to continue asking "why can poisoning overwrite already
cached answer" and no one seems to be answering, and, unless I'm a
moron (which is not impossible), that's the crux of this issue.

Let's say you query FOO.COM, and it says "My servers are A, B, and
C." So you cache A, B, and C and go on your merry way.

Now, before the TTL expires the data center with B and C in it gets
hit by a tornado. The FOO.COM admin quickly stands up two new
servers in a new data center, and updates FOO.COM to be servers A,
D, and E. So you go back and ask for "newname.foo.com" from A, by
random chance. A sends you back "it's 1.2.3.4, and A, D, and E
know all about it.".

What you're advocating is that the server go, humm, that's not what
I got the first time and keep using A, B, and C, for which B and C
may no longer be authortative, or worse in this example, are completly
offline. It would then wait until the TTL expires to get the same
data.

That's not to say there aren't possibly other checks or rechecks
that could be done, but in the vast majority of day to day cases
when someone properly gives you additional information it is useful.

Authorities are updated all the time. There are thousands of these
cache overwrites with new, more up to date info every day.

Jay_Ashworth · August 11, 2008, 4:09pm

Let's say you query FOO.COM, and it says "My servers are A, B, and
C." So you cache A, B, and C and go on your merry way.

Now, before the TTL expires the data center with B and C in it gets
hit by a tornado. The FOO.COM admin quickly stands up two new
servers in a new data center, and updates FOO.COM to be servers A,
D, and E. So you go back and ask for "newname.foo.com" from A, by
random chance. A sends you back "it's 1.2.3.4, and A, D, and E
know all about it.".

What you're advocating is that the server go, humm, that's not what
I got the first time and keep using A, B, and C, for which B and C
may no longer be authortative, or worse in this example, are completly
offline. It would then wait until the TTL expires to get the same
data.

As long as one of the cached zone servers is still working, no, it
would only time out the individual queries.

That's not to say there aren't possibly other checks or rechecks
that could be done, but in the vast majority of day to day cases
when someone properly gives you additional information it is useful.

Authorities are updated all the time. There are thousands of these
cache overwrites with new, more up to date info every day.

It would seem to me that the aggregate amount of trouble that would be
caused by handling differently the situation that you posit above is
several orders of magnitude smaller in importance than the trouble which
would stem from someone cache-poisoning .com on the resovers of the top
10 consumer ISPs.

IE: that's a really lame reason to leave it as it is.

Cheers,
-- jra

Jack_Bates · August 11, 2008, 4:31pm

Leo Bicknell wrote:

That's not to say there aren't possibly other checks or rechecks
that could be done, but in the vast majority of day to day cases
when someone properly gives you additional information it is useful.

Authorities are updated all the time. There are thousands of these
cache overwrites with new, more up to date info every day.

The problem is, it's not trustworthy. There's lots of heuristics that could be done to determine pre-expiration of cached data, but it should be just that an expiring of the cached data allowing for a new request for it.

Possible scenario's:

1) Mismatched auth records causes expiring of the records. Attacker has successfully spoofed a packet and caused a cache dump for the record in question. He must now spoof another packet before the cache is rebuilt.

2) Mismatched auth records are ignored, causing delays in the remote updating to new records. This is a better distrust model, though it may have a longer impact on outage situations.

3) Recognize successive timed out queries to an auth server, marking it as possibly not there, stale out the cache and ask for new information, or allow for cache overwrite only concerning records which appear not to be working.

4) Recognize entries which are receiving forged packet attempts (easiest to do) and do not support cache overwrites for those domains. This supports normal operation unless someone is specifically attacking a domain, and then that domain leaves a trust model to an untrusted model, which may have performance hits in that we will ignore valid updates too, but given that the server cannot tell the forgery from the real update, it should be ignored. This method is vulnerable to the once in a X shot that the first forged packet actually makes it with the right port/id combo.

Cache overwrites are dangerous. IMHO, They need to go away or be protected by extensive checks to insure integrity of the cache.

Jack

Edward_Lewis · August 11, 2008, 5:30pm

In the original definition of DNS, there were no or almost no dynamic changes. The protocol wasn't built for that. The result is all of the old sacred texts are written in a context that everything is static (for as least as long as the TTL).

The modern operation of the DNS is more dynamic. It isn't a case that the protocol today cannot be (more) dynamic (than the founding engineers thought) but that all of the documented texts upon wish we today base arguments are written along the "old think" lines. So when we get into a battle of RFCs vs. best current practices the two sides are not speaking the same language.

The DNS can be more dynamic by liberalizing it's ability to learn new data. It's a sliding curve - more liberal means accepting more stuff, some of which might be the garbage we don't want. The choice is between tight and unbending versus dynamic and less trustworthy. The goal is to strike the right balance.

It is possible for a protocol to do what DNS does and also have secure updates. But the DNS as it is in the RFCs, lacks a real good foundation for extension. We can do something, but we will probably never get to the final goal.

Why *can* cached DNS replies be overwritten?

Why can cached DNS replies be overwritten?