At 6:30 p.m. Tuesday (PST), a Microsoft technician made a
change to the routers on the edge of Microsoft's Domain Name Server
network. The DNS servers are used to connect domain names with numeric
IP addresses (e.g. 18.104.22.168) of the various servers and networks
that make up Microsoft's Web presence.
Connect 22.5 hours worth of dots between these two events.
At approximately 5 p.m. Wednesday (PST), Microsoft removed the changes
to the router configuration and immediately saw a massive
improvement in the DNS network.
Their management should be real embarrassed to take so long to back out the
Somebody bitched a router config, and it took 22.5 hours to figure it out?
That's the sort of goof you might expect from a mom and pop ISP with a
hundred customers and virtually no IP clue. I'll be shocked if multiple
people (at multiple levels) aren't fired over this. Screwing up happens.
Taking this long to figure out what you (or even for others to figure out
what someone else) screwed up is just absolutely unbelievable.
Is the brain cell in their networking division on vacation this week?
Umm.. let's think more carefully here.
A major *MAJOR* player is changing a config *during prime time*?
Hell, we're not that big, and we get 3AM-7AM local. ANything else is
So we'll assume that the *real* timeline was:
5PM something *else* melts
6:30PM change a config to stop THAT emergency
6:45PM notice you've scrogged it up
<next 19 hours> try to decide which is worse, the DNS being screwed
but your *local* operations are back online using local private
secondaries, or DNS being OK but whatever was loose trashing the
corporate backbone? Meanwhile, your OTHER set of network monkeys is
busy fighting whatever fire melted stuff to start with...
<META MODE="so totally hypothetical we won't even GO there...">
They'd not be the first organization this week that had to make an
emergency router config change because Ramen multicasting was melting their
routers, or the first to not get it right on the first try.
They'd merely be the ones thinking hardest how to put the right spin on it...
I have *NO* evidence that Ramen was the actual cause other than it's this
week's problem. However, I'm pretty sure that *whatever* happened,
the poor router tech was *already* having a Very Bad Day before he ever
GOT to the part where he changed the config.....
Operating Systems Analyst
Vladis, et al;
Of course, we know (well that there have been multicast problems
from the MSDP storms from the RAMEN since Saturday before last. If there
have been wider network
problems caused by these MSDP storms, I would like to hear of them,
or off list. I would like to give a report on this in Atlanta.
For those MSDPer's out there, we have good luck with rate limits to limit
the damage. I will be glad to share (off list?) the configs used.
Also, FWIW, there does not seem to have been a MSDP storm at 5:00 PM