Akamai DNS Issue?

Daniel Golding suggested that the problem was that many folks are sharing Akamai's magic DNS algorithms.
This doesn't appear to be a problem with magic algorithms - it appears that they're sharing the _servers_,
and that the reported attack on the servers means that it doesn't matter how magic the algorithms are.
Good luck to them on developing a longer-term workaround for the next attack.

  Bill Stewart, bill.stewart@pobox.com

Disclaimer: This note is, as usual, my personal opinion, not my employer's.

Workarounds and defences already exist, and have been in use for a long time.

The chance of catastrophic, systematic operator error (e.g. rdist gone wild, RIF-frenzied, root-wielding, caffeine-crazed sysadmins run amok) problems can be avoided by including nameservers managed by different organisations in the NS set.

Distributed (and non-distributed) denial of service attacks can be mitigated using dispersed anycast nameserver deployment.

Network partition/isolation events (e.g. under sea cable failures which isolate an economy) can be mitigated by strategic location of (anycast instances of) locally-relevant nameservers.

Operational routing and instrumentation challenges with managing a dispersed anycast deployment can be mitigated by including non-anycast nameservers in the NS set alongside the anycast nameservers.

Failures due to ancillary equipment failure can be avoided by eliminating single points of failure (e.g. wide geographic disperson of nameservers into topologically-distant infrastructure).

Failures due to political interference can be avoided by deploying nameservers in complementary regions of governance.

Failures or vulnerabilities in individual DNS implementations can be mitigated by ensuring that not all nameservers in the NS set run the same DNS software (or similar software, developed from a common code base).

Failures or vulnerabilities in ancillary software (routers, switches, operating systems, etc) can be mitigated by ensuring that different nameservers rely on different brands of routers, switches and operating systems.

Failures in master servers can be mitigated by having several of them; simultaneous failure of all master servers can be managed to some degree using appropriate SOA timers, so that slave servers provide coverage while master servers are brought back into service.

Different styles of attack can be mitigated by different DNS hosting strategies. A robustly-hosted zone will have an NS set that exhibits several or all of these approaches (and others too).

The hosting of the root zone provides guidance, here.

Joe

Workarounds and defences already exist, and have been in use for a long
time.

<long list removed>

Failures in master servers can be mitigated by having several of them;
simultaneous failure of all master servers can be managed to some
degree using appropriate SOA timers, so that slave servers provide
coverage while master servers are brought back into service.

Different styles of attack can be mitigated by different DNS hosting
strategies. A robustly-hosted zone will have an NS set that exhibits
several or all of these approaches (and others too).

The hosting of the root zone provides guidance, here.

Joe

But you don't say how to avoid failures caused by massive confusion when
maintaining a excessively complicated system....

Mark

By isolating the complexity to small pockets, each of which is largely invisible to the rest of the system, and reducing the coordination required between the different autonomous operators involved to managable levels, much as the high complexity of the global routing system is managed.

This isn't just handwaving -- the root zone has been served with enormous reliability for a long time, accommodating all of the precautions on that list. That reliability is a feature of prudence and simplicity, not needless complexity and confusion.

Joe

Mark Radabaugh wrote:

But you don't say how to avoid failures caused by massive confusion when
maintaining a excessively complicated system....

I don't have much to offer for the "excessively complicated" case
(which I think the instant case is an example of), but there are
cases as complex and complicated with some justification in my history.

For those, the best solutions involved concepts like "canned, tested,
documented procedures", "quality control", "change management" (which
included "staging", "testing and verification", and so on.

We were not fond, in the "production" and "system test" environments, of people who made ad hoc changes of any kind.

Many years ago, I hand carried a patch through the approvals process,
group leader reviewed the purpose, urgency, test methods, test results,
and signed the sheet. District manager looked it over and asked "what
are the chances that this patch could fail?" I flippantly replied
"One in a million!".

He handed the documents back unsigned with the words "Seven times
in the Metro (Los Angeles, California) office tonight.

Bill,

The point still holds - when too much high value content shares anything -
algorithm, infrastructure, etc you get vulnerability. The problem I was
highlighting was excessive sharing, not AkaDNS magic.

(Of course, everything shares the general DNS infrastructure, but the
numerous roots (some of which are anycast-ed) plus the distributed nature
make that tougher to completely take out. )

It looks like this was an attack on the Akamai DNS redirection
infrastructure rather than the Akamai hosting infrastructure. Their DNS
servers present far fewer points to attack. It would be interesting to hear
a detailed analysis of the attack at some point. Maybe a good topic for the
next NANOG? (Patrick? :slight_smile:

Part of the difficulty of discussing this is, that by bringing up points of
potential vulnerability in a public forum, it provides hints for those who
would wreak havoc. I'm sure many of us can come up with other bits of
vulnerable shared infrastructure, but it seems inappropriate to discuss this
on such an open forum. I can only wonder if the more private forums being
hosted by government organizations are effective, or simply boondoggles
designed to provide political cover.

- Dan