large organization nameservers sending icmp packets to dns servers.

Kevin_Oberman · August 7, 2007, 8:50pm

Date: Tue, 7 Aug 2007 16:33:22 -0400 (EDT)
From: Donald Stahl <don@calis.blacksun.org>

> This has been a pain for me for years. I have tried to reason with
> security people about this and, while they don't dispute my reasoning,
> they always end up saying that it is the "standard" practice and that,
> lacking any evidence of what it might be breaking, it will continue to
> be blocked. And I don't mean small companies, either. One of the biggest
> issues I have is with one of the countries largest government funded
> research labs.
Can someone, anyone, please explain to me why blocking TCP 53 is
considered such a security enhancement? It's a token gesture and does
nothing to really help improve security. It does, however, cause problems.

You have no way of knowing why a client might want or need to contact you
via TCP 53 for DNS- so why would you block them?

The fact is most people, to this day, still believe that TCP 53 is only
used for axfr's.

Someone was only too happy to point out to me that he would never create
a record larger than 512 bytes so why should they allow TCP queries? The
answer is simple- because they are supposed to be allowed. By disallowing
them you are breaking the agreed upon rules for the protocol. Before
long it becomes impossible to implement new features because you can't be
sure if someone else hasn't broken something intentionally.

If you don't like the rules- then change the damned protocol. Stop just
doing whatever you want and then complaining when other people disagree
with you.

Don,

You are preaching to the choir...at least in the group. But I have found
that security types (I mean those with a police/physical security
background) don't must care for these arguments. It usually comes down
to "lock and bar every door unless you can prove to them that there is a
need to have the door unlocked".

Standards and such mean nothing to them. Only evidence that something
is broken that has to work will convince them to change something. It's
the tcp/53 is evil meme.

Andrew_Sullivan · August 7, 2007, 9:23pm

So these people are also the ones responsible for chaining shut fire
doors because "fires never happen in this building, but theft does"?
I sure feel safer now!

The "need to have the door unlocked" is because that's the way the
building is designed to fail its fireproofing. And the need to have
the TCP port open is because that's the way the network protocol is
designed to fail from UDP.

If "this is the way the protocol works" is not enough of an argument,
then I'm afraid we're past the point of engineering and into the
realm of tea-leaf readers and chicken-entrail-based prognosticators.
I'm aware there are such people promoting themselves as security
experts. It's rather depressing that those people can still find
gainful employment; but in this post-literate age where people prefer
to repeat (or listen to) foolish bromides rather than Read the Fine
Commentaries that define the protocol, I suppose I ought not to be
surprised.

Shocked but not surprised,
A

Douglas_Otis · August 7, 2007, 10:21pm

Ensuring an authoritative domain name server responds via UDP is a critical security requirement. TCP will not create the same risk of a resolver being poisoned, but a TCP connection will consume a significant amount of a name server's resources.

ACLs restricting TCP fall-back is fairly common. For example, too many bytes might be placed into a domain's SPF records. While TCP offers a fallback mode of operation for this fairly common error, this fallback does not ensure oversize records are fixed promptly. TCP fallback on such records leaves open an opportunity to stage DDoS attacks when bad actors wishes to take down authoritative name servers while also attempting to poison resolvers. Here again, SPF might offer access to remote resolvers query for the records to be poisoned, isolate query ports, and time poison records. : (

http://www.ietf.org/internet-drafts/draft-ietf-dnsext-forgery-resilience-01.txt

-Doug

Paul_A_Vixie4 · August 8, 2007, 7:11pm

i normally agree with doug....

dotis@mail-abuse.org (Douglas Otis) writes:

Ensuring an authoritative domain name server responds via UDP is a
critical security requirement. TCP will not create the same risk of a
resolver being poisoned, but a TCP connection will consume a significant
amount of a name server's resources.

...but this is flat out wrong, dead wrong, no way to candy coat it, wrong.

Douglas_Otis · August 9, 2007, 12:22am

Wanting to understand this comment, I'll expand upon the quoted statement.

Resolver's factors affecting DNS security are:
  - selection of port and transaction IDs
  - restrictions on outstanding queries for same resource
  - limits on inbound bandwidth

Ignoring uncontrollable factors...

Authoritative server factors affecting security are:
  - time frame for an answer
  - duration of RR TTLs
  - number of servers

A short time frame for an answer along with longer TTLs are influenced by authoritative servers and also affect spoofing rates.

When DNS TCP is used, the transport sequence number further precludes a spoofed TCP answer from being accepted. When a truncated response is returned, TCP fallback may be attempted. When a TCP ICMP refusal is filtered or never sent, but TCP has been blocked, the timeframe alloted for spoofing could entail the entire TCP timeout. However, probability for successful spoofing includes an additional multiplier of 1 / 2^32. This reduction should sufficiently negate an additional timeout duration.

TCP requires state and introduces several additional exchanges for a given number of answers. Any effort related to poisoning will likely attempt to delay an answer by adding to the server's overhead. Precluding truncation, and thereby eliminating TCP, should favorably reduce server overhead and increase overall performance.

Of course, a more practical method would be to ensure sufficient DNS resources are available by increasing server resources. That said, how many domains allocate a couple of prior generation servers for DNS?

-Doug

Paul_A_Vixie · August 9, 2007, 12:35am

>> ... but a TCP connection will consume a
>> significant amount of a name server's resources.
>
> ...wrong.

Wanting to understand this comment, ...

the resources given a nameserver to TCP connections are tightly controlled,
as described in RFC 1035 4.2.2. so while TCP/53 can become unreliable during
high load, the problems will be felt by initiators not targets.

(this is why important AXFR targets have to be firewalled down to a very small
population of just one's own nameservers, and is why important zones have to
use unpublished primary master servers, and is why f-root's open AXFR of the
root zone is a diagnostic service not a production service.)

Douglas_Otis · August 9, 2007, 7:13pm

The relevant entry in Section 1035 4.2.2 recommends that the server not block other activities waiting for TCP data. This is not exactly a requirement that TCP should fail before UDP.

The concern leading to a suggestion that TCP always fail was a bit different. A growing practice treats DNS as a type of web server when used to publish rather bulky script-like resource records. Due to typical sizes, it is rather common to find these records depend upon TCP fallback. This problem occurred with paypal, for example. TCP fallback is especially problematic when these records are given wildcards. Such fallback increases the amplification associated with an exploit related to the use of the script within the record.

Of course there are better ways to solve this problem, but few are as certain.

-Doug

Paul_A_Vixie · August 9, 2007, 9:05pm

> the resources given a nameserver to TCP connections are tightly
> controlled, as described in RFC 1035 4.2.2. so while TCP/53 can become
> unreliable during high load, the problems will be felt by initiators not
> targets.

The relevant entry in Section 1035 4.2.2 recommends that the server not
block other activities waiting for TCP data. This is not exactly a
requirement that TCP should fail before UDP.

it is semantically equivilent to such a requirement, in that UDP/53 is an
"other activity" performed by name servers. it happens to be implemented
this way in all versions of BIND starting in 4.8 or so (when named-xfer was
made a separate executable), all versions of Windows DNS, and all current
name server implementations i am aware of (including powerdns, nominum ANS,
and NSD). so while "not exactly", it's "effectively" a requirement, and i
think we ought to design our systems with this constraint as a given.

The concern leading to a suggestion that TCP always fail was a bit
different. A growing practice treats DNS as a type of web server when used
to publish rather bulky script-like resource records. Due to typical sizes,
it is rather common to find these records depend upon TCP fallback. This
problem occurred with paypal, for example. TCP fallback is especially
problematic when these records are given wildcards. Such fallback increases
the amplification associated with an exploit related to the use of the
script within the record.

Of course there are better ways to solve this problem, but few are as
certain.

i think you're advising folks to monitor their authority servers to find out
how many truncated responses are going out and how many TCP sessions result
from these truncations and how many of these TCP sessions are killed by the
RFC1035 4.2.2 connection management logic, and if the numbers seem high, then
they ought to change their applications and DNS content so that truncations
no longer result.

or perhaps you're asking that EDNS be more widely implemented, that it not
be blocked by firewalls or perverted by hotelroom DNS middleboxes, and that
firewalls start allowing UDP fragments (which don't have port numbers and
therefore won't be allowed by UDP/53 rules).

i would agree with either recommendation.

but i won't agree that TCP creates stability or load problems for servers.

Valdis_Kletnieks · August 9, 2007, 9:29pm

How does the (eventual) deployment of DNSSEC change these numbers?

And who's likely to feel *that* pain first?

Paul_A_Vixie4 · August 9, 2007, 10:58pm

Valdis.Kletnieks@vt.edu writes:

> ... advising folks to monitor their authority servers to find out how
> many truncated responses are going out and how many TCP sessions result
> from these truncations and how many of these TCP sessions are killed by
> the RFC1035 4.2.2 connection management logic, and if the numbers seem
> high, then they ought to change their applications and DNS content so
> that truncations no longer result.

How does the (eventual) deployment of DNSSEC change these numbers?

DNSSEC cannot be signalled except in EDNS.

And who's likely to feel *that* pain first?

the DNSSEC design seems to distribute pain very fairly.

Valdis_Kletnieks · August 10, 2007, 3:14pm

> How does the (eventual) deployment of DNSSEC change these numbers?

DNSSEC cannot be signalled except in EDNS.

Right. Elsewhere in this thread, somebody discussed ugly patches to keep
the packet size under 512. I dread to think how many different ways of
"protecting" DNS are deployed that will break EDNS, and just haven't been
noticed because there's little enough *actual* EDNS breakage that it's down
in the noise of *other* "random voodoo" breakage at those sites.

> And who's likely to feel *that* pain first?

the DNSSEC design seems to distribute pain very fairly.

I actually meant "which 800 pound gorilla is going to try this first and
find all the bustifications", but your answer is good too..

Douglas_Otis · August 10, 2007, 11:11pm

Your comments have helped.

i think you're advising folks to monitor their authority servers to find out how many truncated responses are going out and how many TCP sessions result from these truncations and how many of these TCP sessions are killed by the RFC1035 4.2.2 connection management logic, and if the numbers seem high, then they ought to change their applications and DNS content so that truncations no longer result.

Monitoring is a good recommendation, as many incorrectly estimate record limits. Wildcard resources should also be checked against maximal labels. Fallback may occur with resource records encompassing a bit more than a couple hundred bytes. The assurance TCP will fail first is heartening. How this protection interacts with an emerging exploit could be interesting. I'll try to setup some tests and be less pragmatic.

or perhaps you're asking that EDNS be more widely implemented, that it not be blocked by firewalls or perverted by hotelroom DNS middleboxes, and that firewalls start allowing UDP fragments (which don't have port numbers and therefore won't be allowed by UDP/53 rules).

TCP offers a means to escape UDP related issues. On the other hand, blocking TCP may offer the necessary motivation for having these UDP issues fixed. After all, only UDP should be required. When TCP is designed to readily fail, reliance upon TCP seems questionable. As DNSSEC in introduced, TCP could be relied upon in the growing number of instances where UDP is improperly handled. UDP handling may have been easier had EDNS been limited to 1280 bytes. On the other hand, potentially larger messages may offer the necessary motivation for adding ACLs on recursive DNS, and deploying BCP 38.

No pain, no gain might be a motto that applies equally to DNS as it does for weight lifting.

-Doug

Paul_A_Vixie · August 10, 2007, 11:41pm

Your comments have helped.

groovy.

When TCP is designed to readily fail, reliance upon TCP seems questionable.

i caution against being overly cautious about DNS TCP if you're using RFC 1035
section 4.2.2 as your basis for special caution. DNS TCP only competes
directly against other DNS TCP. there are only two situations where a DNS TCP
state blob is present in a DNS target ("server") long enough to be in any
danger: when doing work upstream to fulfill the query, and in zone transfers.

when answering DNS TCP queries in an authority server, there is by definition
no "upstream work" to be done, other than possible backend database lookups
which are beyond the scope of this discussion. these responses will usually
be generated synchronous to the receipt of the last octet of a query, and the
response will be put into the TCP window (if it's open, which it usually is),
and the DNS target ("server") will then wait for the initiator ("client") to
close the connection or send another query. (usually it's a close.)

when answering DNS TCP zone transfer requests in an authority server, there is
a much larger window of doom, during which spontaneous network congestion can
close the outgoing TCP window and cause a DNS target ("server") to think that
a TCP session is "idle" for the purpose of RFC 1035 section 4.2.2 TCP resource
management. while incremental zone transfer is slightly less prone to this
kind of doom than full zone transfer, since the sessions are shorter, it can
take some time for the authority server to compute incremental zone "diffs",
during which the TCP session may appear "idle" through no fault of the DNS
initiator ("client") who is avidly waiting for its response.

lastly, when answering DNS TCP queries in a recursive caching nameserver, it
can take a while (one or more round trips to one or more authority servers)
before there is enough local state to satisfy the query, during which time the
TCP resources held by that query might be reclaimed under RFC 1035 section
4.2.2's rules.

the reason why not to be overly cautious about TCP is that in the case where
you're an authority server answering a normal query, the time window during
which network congestion could close the outbound TCP window long enough for
RFC 1034 section 4.2.2's rules to come into effect, is vanishingly short. so
while it's incredibly unwise to depend on zone transfer working from a small
number of targets to a large number of initiators, and it is in fact wise to
firewall or ACL your stealth master server so that only your designated
secondary servers can reach it, none of this comes into play for normal
queries to authority servers -- only zone transfers to authority servers.

the unmanageable risk is when a recursive caching nameserver receives a
query by TCP and forwards/iterates upstream. if this happens too often, then
the RFC 1035 section 4.2.2 rules will really hurt. and thus, it's wise, just
as you say, to try to make sure other people don't have to use TCP to fetch
data about your zone. the counterintuitive thing is that you won't be able
to measure the problems at your authority server since that's not where they
will manifest. they'll manifest at caching recursive servers downstream.

As DNSSEC in introduced, TCP could be relied upon in the growing number of
instances where UDP is improperly handled.

this would be true if TCP fallback was used when EDNS failed. it's not.
if EDNS fails, then EDNS will not be used, either via UDP or TCP. so if
improper handling of UDP prevents EDNS from working, then EDNS and anything
that depends on EDNS, including DNSSEC, will not be used.

UDP handling may have been easier had EDNS been limited to 1280 bytes.

if you mean, had EDNS been limited to nonfragmentation cases, then i think
you might mean 576 bytes or even 296 bytes. 1280 is an IPv6 (new era) limit.

On the other hand, potentially larger messages may offer the necessary
motivation for adding ACLs on recursive DNS, and deploying BCP 38.

i surely do hope so. we need those ACLs and we need that deployment, and if
message size and TCP fallback is a motivator, then let's turn UP the volume.

Roland_Dobbins1 · August 10, 2007, 11:54pm

There are so many more larger and immediate reasons for doing these things that I seriously doubt message size and TCP fallback on the DNS will have any impact at all in terms of motivating the non-motivated.

But, one can always hope.

;>

John_Kristoff6 · August 11, 2007, 2:55am

As a datapoint I ran some tests against a reasonably diverse and
sizeable TLD zone I work with in another forum. I queried the name
servers listed in the parent to see if I could successfuly query
them for their corresponding domain name they are configured for
using TCP. Out of about 9,300 unique name servers I failed to
receive any answer from about 1700 of them. That is a bit more
than an 18% failure rate.

John