Google DNS intermittent ServFail for Disney subdomain

Hi Nanog,

I am principal network engineer for sister-studio to Disney Studios. They
have been struggling with DNS issues since Thursday 12th October.

By all accounts it appears as though *some* of the Google DNS resolvers
cannot reach the authoritative nameservers for "studio.disney.com".

This is causing ~20-30% of all DNS requests against Google Public DNS
8.8.8.8 / 8.8.4.4 to fail for requests in this subdomain.

The name servers reside in 153.7.233.0/24.

Might someone be able to *connect me* with someone at Google to assist my
poor colleagues who are banging their heads against a brick wall here.

Thank you,
David

Looks like some Disney services are/have been down.
http://downdetector.com/status/disneyworld

Thanks,
Donald

Well well, it looks like a Direct Connect circuit to Google was leaking the
route to this DMZ 153.7.233.0/24 back to Google via BGP.

Return traffic from Google (for only some fraction of DNS queries) was
passing back across this leaked route, and being dropped on this Direct
Connect peering point at Disney.

Gotta love it when a problem is solved, by the OP, within an hour of
resorting to mailing the NANOG community.

Thanks all, nothing to see here!

-David

David Sotnick <sotnickd-nanog@ddv.com> writes:

Gotta love it when a problem is solved, by the OP, within an hour of
resorting to mailing the NANOG community.

That's the way it is. Posting to a public forum always make you think
about the issue a second time, and that's what it takes.

The weird thing is that I've tried to cheat the system by thinking
without posting, and it doesn't work! Don't know why, but there appears
to be a difference between thinking and thinking :slight_smile:

Thanks a lot for posting the solution.

Bjørn

Worthy .sig fodder indeed. :slight_smile:

I know it doesn't help your problem, but friends don't let friends use public DNS resolvers (Google, L3, Open DNS, etc.). :wink:

Would be great if makers of home routers would implement full recursive DNS resolvers
instead of just forwards in their gear.

a message of 49 lines which said:

Would be great if makers of home routers would implement full recursive DNS
resolvers

The good ones do <Turris - Omnia;

Well well, it looks like a Direct Connect circuit to Google was leaking the
route to this DMZ 153.7.233.0/24 back to Google via BGP.

Return traffic from Google (for only some fraction of DNS queries) was
passing back across this leaked route, and being dropped on this Direct
Connect peering point at Disney.

Gotta love it when a problem is solved, by the OP, within an hour of
resorting to mailing the NANOG community.

This shows some issues as well, I think?
http://dnsviz.net/d/studio.disney.com/servers/

$ dig NS disney.com

;; ANSWER SECTION:
disney.com. 4676 IN NS huey11.disney.com.
disney.com. 4676 IN NS huey.disney.com.
disney.com. 4676 IN NS Orns02.dig.com.
disney.com. 4676 IN NS Orns01.dig.com.
disney.com. 4676 IN NS Sens02.dig.com.
disney.com. 4676 IN NS Sens01.dig.com.

$ dig NS studio.disney.com @huey11.disney.com.
;; AUTHORITY SECTION:
studio.disney.com. 600 IN NS wallyb.pixar.com.
studio.disney.com. 600 IN NS andre.pixar.com.
studio.disney.com. 600 IN NS cliff.studio.disney.com.
studio.disney.com. 600 IN NS norm.studio.disney.com.

$ for d in $(dig +short NS disney.com); do dig +short SOA disney.com @$d;
done
huey.disney.com. root.huey.disney.com. 2017102000 3600 900 3600000 3600
huey.disney.com. root.huey.disney.com. 2017102000 3600 900 3600000 3600
huey.disney.com. root.huey.disney.com. 2017102000 3600 900 3600000 3600
huey.disney.com. root.huey.disney.com. 2017102000 3600 900 3600000 3600
huey.disney.com. root.huey.disney.com. 2017102000 3600 900 3600000 3600
huey.disney.com. root.huey.disney.com. 2017102000 3600 900 3600000 3600

$ for d in $(dig +short NS studio.disney.com); do dig +short SOA
studio.disney.com @$d; done
cliff.studio.disney.com. admin.studio.disney.com. 2017101904 10800 3600
604800 86400
cliff.studio.disney.com. admin.studio.disney.com. 2017101904 10800 3600
604800 86400
cliff.studio.disney.com. admin.studio.disney.com. 2017101904 10800 3600
604800 86400
cliff.studio.disney.com. admin.studio.disney.com. 2017101904 10800 3600
604800 86400
cliff.studio.disney.com. admin.studio.disney.com. 2017101904 10800 3600
604800 86400

it looks like the second-level and third-level don't agree with each other
on whom should be the NS for the third-level?

that shouldn't be fatal, but is something to cleanup.

Thanks all, nothing to see here!

None of the NS records/delegations are in agreement. com delegations
don't agree with authoritative in disney.com, and disney.com's
delegations don't agree with studio.disney.com's NSen.

Ignoring the latency impact of your proposal, I wonder what would happen to
the world's authoritative servers if all users hit them directly rather
than going through large recursive resolvers that do caching? I'm guessing
it wouldn't be pretty.

Damian

Damian,

Pragmatically speaking, I strongly suspect the increase in valid queries to authoritative servers even if all “large recursive resolvers” went away would be lost in noise of the overcapacity necessary to deal with even a lower-end DDoS attack.

Perhaps more interestingly, if said recursive resolvers on home routers would implement DNSSEC with RFC 8198 (and the owners of the authoritative zones would sign those zones), an entire class of DDoS attack would be mitigated. Further, if said recursive resolvers also implemented RFC 7706, latency to the root would be reduced and the risk of to the network behind that recursive resolver of a DDoS against the root of the DNS would be removed.

Regards,
-drc

:I know it doesn't help your problem, but friends don't let friends use public DNS resolvers (Google, L3, Open DNS, etc.). :wink:

I've been experimenting with using Google's DNS resolvers for Google's
assorted domains. At some point, I keep meaning to add Google's address
space as in-addr.arpa domains, but just haven't gotten there yet.

Why? Just curious, that's all. Thus far, I haven't really noted any
major differences, but wasn't sure what to expect. Maybe something
would be notably faster/slower, maybe different results/ads/whatever,
I dunno. It just seemed reasonable to punt Google DNS to Google DNS
and see how things work. YMMV, void where prohibited.

~Mike

A 10x increase in baseline queries is still a 10x increase (for whatever
value of "10" the real world would actually throw at us). Although small
by comparison, that still has to be made up in an increase in the overhead
for DDoS.

I'm also led to wonder how much worse it would be if all those CPE were
open recursives instead of open forwarders. I'd like to see CPE
manufacturers' decision making and processes improved BEFORE we start
encouraging them to go around ISPs' DNS servers or the large public
recursive clouds.

A while back, the Québec government, wanting to protect its gambling
monopoly, decided to force ISPs to block a list of gambling sites (list
drawn up by the gambling monopoly to block outside competitors).

Recently, Bell Canada went to government suggesting the government setup
a internet web site block list to prevent canadians from accessing
pirating web sites.

And of course, in the USA, the upcoming decision to drop Title II for
ISPs may result in large ISPs quickly starting to play tricks on DNS
(redirecting traffic to their own properties etc).

While all this is in its infancy and may not happen, this could have
serious impact on the architecture of DNS with large swaths of customers
bypassing their ISP's DNS services.

But it is more likely that everyone would be going to 8.8.8.8 instead of
running their own recursive server. But if the "free" DNS servers also
start to play games or charge money, then CPE equipment may start
including a full bind recursive server and bypass everything.

This is why it is important for network folks to educate politicians to
not play with the internet.

And it is believed that sold end user devices wouldn't just be
required to implement this blacklist themselves? This is reminding me
of the xkcd coming with the encryption and the wrench.