I got some messages from people who weren't exactly clear on how anycast works and fails. So let me try to explain...
In IPv6, there are three ways to address a packet: one-to-one (unicast), one-to-many (multicast), or one-to-any (anycast). Like multicast addresses, anycast addresses are shared by a group of systems, but a packet addressed to the group address is only delivered to a single member of the group. IPv6 has "round robin ARP" functionality that allows anycast to work on local subnets.
Anycast DNS is a very different beast. Unlike IPv6, IPv4 has no specific support for anycast, and the point here is to distribute the group address very widely, rather than over a single subnet anyway. So what happens is that a BGP announcement that covers the service address is sourced in different locations, and each location is basically configured to think it's the "owner" of the address.
The idea is that BGP will see the different paths towards the different anycast instances, and select the best one. Now note that the only real benefit of doing this is reducing the network distance between the users and the service. (Some people cite DoS benefits but DoSsers play the distribution game too, and they're much better at it.)
Anycast is now deployed for a significant number of root and gtld servers. Before anycast, most of those servers were located in the US, and most of the rest of the world suffered significant latency in querying them. Due to limitations in the DNS protocol, it's not possible to increase the number of authoritative DNS servers for a zone beyond around 13. With anycast, a much larger part of the world now has regional access to the root and com and net zones, and probably many more that I don't know about.
However, there are some issues. The first one is that different packets can end up at different anycast instances. This can happen when BGP reconverges after some network event (or after an anycast instance goes offline and stops announcing the anycast prefix), but under some very specific circumstances it can also happen with per packet load balancing. Most DNS traffic consists of single packets, but the DNS also uses TCP for queries sometimes, and when intermediate MTUs are small there may be fragmentation.
Another issue is the increased risk of fait sharing. In the old root setup, it was very unlikely for a non-single homed network to see all the root DNS servers behind the same next hop address. With anycast, this is much more likely to happen. The pathological case is one where a small network connects to one or more transit networks and has local/regional peering, and then sees an anycast instance for all root servers over peering. If then something bad happens to the peering connection (peering router melts down, a peer pulls an AS7007, peering fabric goes down, or worse, starts flapping), all the anycasted addresses become unreachable at the same time.
Obviously this won't happen to the degree of unreachability in practice (well, unless there are only two addresses that are both anycast for a certain TLD, then your milage may vary), but even if 5 or 8 or 12 addresses become unreachable the timeouts get bad enough for users to notice.
The 64000 ms timeout query is: at what point do the downsides listed above (along with troubleshooting hell) start to overtake the benefit of better latency? I think the answer lies in the answers to these three questions:
- How good is BGP in selecting the lowest latency path?
- How fast is BGP convergence?
- Which percentage of queries go to the first or fastest server in the list?