DNS problems to RoadRunner - tcp vs udp

Florian_Weimer · June 15, 2008, 8:19am

* Sean Donelan:

Any network with a large user population probably should have separate
DNS servers for their authoritative zones answering the Internet
at-large and their recursive resolvers serving their user population.

It's not so much a question of network size. You absolutely must use
different views if you host DNS for customer domains because there is a
race conidtion in the delegation provisioning protocol used by most TLDs
(you need to add the domain before you receive the delegation).

Mark_Andrews2 · June 16, 2008, 1:09am

In article <48546625.6040301@rockynet.com> you write:

Sean Donelan wrote:

1. Separate your authoritative and recursive name servers
2. Recursive name servers should only get replies to their own DNS
queries from the Internet, they can use both UDP and TCP

We've just completed a project to separate our authoritative and
recursive servers and I have a couple notes...

1) For the recursive-only, we're using a combination of BIND's
"query-source address a.b.c.d" and "listen-on e.f.g.h" in the hopes of
providing some additional measure of protection against cache poisoning.
The "listen-on" IPs are ACL'd at the borders so non-clients cannot get
ANY packets to them. The "query-source address" itself doesn't appear in
the "listen-on" list either and won't respond to queries. I know this
isn't foolproof, but it probably raises the bar slightly against off-net
poisoning attempts.

  Named will reject queries on the *-source sockets. It
  will also drop responses on the listening sockets provided
  you havn't set the query-souce port to port 53.

2) The biggest drawback to separation after years of service is that
customers have come to expect their DNS changes are propagated instantly
when they are on-net. This turns out to be more of an annoyance to us
than our customers, since our zone is probably the most frequently updated.

  Querying for type SOA at the name will prevent named caching
  negative responses and still allow existance tests to be
  made. nsupdate makes SOA queries to workout which zone
  needs to be updated and to also determine which server to
  send the updates to. We realised a long time ago that we
  needed to have a way to find the containing zone that didn't
  result in caches being filled with the side effects of that
  discover mechanism.

Named, by default, sets the ttl to zero on negative responses
to SOA queries.

3) I've gone so far as to remove the root hint zone from our auth-only
boxes, again out of paranoia ("recursion no" does the trick, this is
just an extra bit of insurance against someone flipping that bit due to
a lack of understanding of the architecture). There is one third party
we have to use an 'also-notify' by IP address in this case for their zone.

  Authoritative only servers need hints so that NOTIFY will
  work in the general case. Eventually, they will also need
  them so we can get rid of IP addresses in masters clauses
  on slave/stub zones. This will help reduce the costs in
  renumbering.

Mike

Mark

Michael_Sinatra · June 16, 2008, 6:56am

Mark Andrews wrote:

Authoritative only servers need hints so that NOTIFY will
work in the general case.

Presumably that's because the authoritative server will want to look up the RDATA (hostname) of each NS record that serves a zone for which it is authoritative. Could you avoid this if you used something like 'notify explicit' and specified all slave servers by IP address in an also-notify clause?

Eventually, they will also need
  them so we can get rid of IP addresses in masters clauses
  on slave/stub zones. This will help reduce the costs in
  renumbering.

Would an administrator still have the option of specifying masters by IP address if they desire, and therefore remove the need for the hints file? It seems that this would at least give the option of not only forcing recursion off, even if someone turns it on by accident (as Mike notes), but it also should help reduce the potential for reflection attacks from authoritative servers giving upward referrals for out-of-zone queries, no?

michael

Scott_C_McGrath · June 16, 2008, 4:51pm

All,

Thanks for the helpful suggestions.

For what it's worth we use Cisco's CNR as we operate a MAC registration system which controls access to our network. We allow customers to select hostnames which are pushed into DDNS when the the system acquires a lease. CNR has internal limits (user configurable) which control the TCP state machine and these are easy to overwhelm as once you hit the high limit
the server process stops accepting new connection requests for any reason until the connections go below the max limit once again. We have been in constant contact with the development group on defending these machines from DDoS activity.

UDP is somewhat easier due to our network structure than TCP to rate limit and we do operate microflow policers to limit UDP activity from any given host.

We once used BIND but bind could not handle the DDNS updates in a reasonable fashion as we have many short lived connections as students access the wireless network between classes
hence the move to CNR which handles DDNS effectively but does not like TCP based attacks Unlike MIT over the river Harvard only has 2 Class B's available and we have many more registered clients than we have IP space for and a community which requires fixed hostnames for academic reasons and since we cannot assign static IP assignments except to well known and fixed services this becomes problematic hence DDNS which as many have pointed out here is painful from a operational standpoint but in our environment it is a lifesaver.

Unfortunately we have needed to insert some controlled breakage into the network to keep the services our customers require alive as TCP SYN attacks are unfortunately still effective in this
day and age we have tried many things our latest foray into TCP control is creating a Snort infrastructure which is sufficient to monitor all flows ingressing and egressing our network and from there based on analysis of the data applying rules to limit traffic in real time from ill behaved TCP hosts as our long term goal is not to operate a corporate network locked into stupid mode with no understanding of protocol needs

- Scott

Nathan Ward wrote: