Who broke .org?

Richard_A_Steenbegen · July 2, 2004, 1:40am

I guess I'll ask first...

Christopher_L_Morro1 · July 2, 2004, 2:04am

maybe they had some whacky attack like Akamai got? I'll have to troll the
nanog archives for the info on finding which/where/what .org TLD box you
are querying when there are problems. Rodney had noted that such info is
helpful for them to figure out what parts of their pods are having
problems.

-Chris

Tim_Wilde1 · July 2, 2004, 2:11am

dig @tld1.ultradns.net. whoareyou.ultradns.net.

Will tell you the IP you're talking to.

dig @tld1.ultradns.net. whoami.ultradns.net.

Will tell you who it thinks you are.

Unfortunately, neither of those is very helpful if the query is timing
out. I think Rodney has mentioned that traceroutes are helpful in that
case, though.

Christopher_L_Morro1 · July 2, 2004, 2:37am

> maybe they had some whacky attack like Akamai got? I'll have to troll the
> nanog archives for the info on finding which/where/what .org TLD box you
> are querying when there are problems. Rodney had noted that such info is
> helpful for them to figure out what parts of their pods are having
> problems.

dig @tld1.ultradns.net. whoareyou.ultradns.net.

Will tell you the IP you're talking to.

dig @tld1.ultradns.net. whoami.ultradns.net.

Will tell you who it thinks you are.

that's the stuff I was searching for

Unfortunately, neither of those is very helpful if the query is timing
out. I think Rodney has mentioned that traceroutes are helpful in that
case, though.

Yea, I suppose I was striving for the: "Don't tell me the internet is
busted, tell me how you think it's busted" Thanks!

Joe_Maimon1 · July 2, 2004, 3:12am

Richard A Steenbergen wrote:

I guess I'll ask first...

There was a gentleman a while back that posited that having only two anycast NS records was broken by design. Suggested that while servicing the whole TLD from two NS that were really a little army of anycast clusters all around out there was very 'l33t', it would not hurt overly much if - say 2 or up to 11 of these clusters were also identified and available by good old fashioned unicast in the NS records for the zone.

Seems to work for "."

Something about "eggs all in one basket". The basket being the anycast topology. Even should the topology be bulletproof overall, his point was that even a partial failure, if it failed "closed" could leave his resolver stuck on non-responsive servers, while perfectly good ones were still out there.

Come to think about it, there was a thread here a while back about this very thing. root server robustness and all that.

What number/timeframe reported .org hiccup does this make?

Is it just this anycast deployment? Has f-root anycast ever reported any stray problems causing some outage to somebody somewhere, were they to be relying on f and only f?

Does nobody else think this?

What about algorithms in recursive resolvers? How about trying first arbitrary "close" or "optimal" ns, if no response try TWO of the remaining next best candidates, then try FOUR of the remaining.....until all are gone and we restart the loop at 1 (to be nice) until request is timed out or one answers and becomes "optimal", with periodic probing or looping across the list for freshness. After all the irrelevant junk the roots and near roots get, some enthusiastic legit retries may not even be unwelcome.

I know I shouldnt hit send but....please be gentle.

David_A_Ulevitch · July 2, 2004, 3:24am

There was a gentleman a while back that posited that having only two anycast NS records was broken by design.

It's the mother of SPOFs. (when your anti-spof solution has an spof...)

Something about "eggs all in one basket". The basket being the anycast topology.

Precisely.

It's a totally valid argument to say that domain.tld holders shouldn't be asked to add 13 nameservers for "robustness" but why not max out the payload of one UDP packet in the name of general robustness for a TLD?

Granted there are plenty of ccTLDs that aren't as robust as they could be but I think com/net/org/edu are held to a higher standard and when you have the room, why not use it? UltraDNS could even list some unicast addresses from their anycast nodes without having to change anything (or much of anything, not knowing their infrastructure/backend)...

-davidu

Jeff_Wasilko · July 2, 2004, 6:32pm

It's at least the 2nd. Last big one was 10/16/2003.

I lost mail as a result of this, so I'm not happy (nothing looks
worse than a prospective employer trying to mail you and getting
bounces due to the domain disappearing from the internet).

I don't think I'm happy having .org run by folks with this as
their motto: ""Technology so advanced, even we don't understand
it!"(R)".

Can't we just go back to non-anycast, please?

-j

ianai · July 2, 2004, 6:38pm

You mean like the roots.... Er, wait a second....

Now, if you suggest a combination, that might be reasonable. (I don't run .org, I just think a blanket statement "anycast is bad" is, well, bad.)

Jeff_Wasilko · July 2, 2004, 6:48pm

I'd be totally happy to see a combination, too. It's just pretty
obvious that the current solution isn't reliable over the long-haul.

-j

_Bill_Woodcock · July 3, 2004, 1:45pm

Uh, how much additional down-time did you want? Rolling the clock back a
decade isn't going to make things _better_.

-Bill

Jeff_Wasilko · July 3, 2004, 3:22pm

Why do you say that?

.com and .net seem to work just fine without the extreme reliance
on 2 anycasted servers (i.e. they are serving up 13 different NS records).
I realize .com/.net may be using anycast as well, but they've
managed to engineer a solution that is stable.
.org was pretty reliable when it was being run by the same folks that are
still running .com/.net.

.org broke one month after it was moved to UltraDNS, and has
since broken at least 4 times (based on reports to NANOG). How
many times have there been significant outages in .com/.net in
the past 10-11 months?

Wouldn't there be a huge uproar (a-la sitefinder) if .com/.net
were as unreliable as .org has been?

-j (wishing his domain wasn't in .org anymore)

E.B_Dreger · July 3, 2004, 3:52pm

Date: Sat, 3 Jul 2004 11:22:34 -0400
From: Jeff Wasilko

>
> Uh, how much additional down-time did you want? Rolling
> the clock back a decade isn't going to make things
> _better_.

Why do you say that?

.com and .net seem to work just fine without the extreme
reliance on 2 anycasted servers (i.e. they are serving up 13
different NS records).

"One anycast implementation is having trouble, therefore anycast
must be inherently bad" is hardly good logic.

Something I forgot to add to my anycast ramblings the other
evening:

Say one has ns1.domain.tld and ns2.domain.tld both anycasted.
Assuming pods have two machines, set your MEDs[*] such that ns1
prefers server "A" and ns2 prefers server "B". This helps
queries destined for different NSes hit different machines.

[*] Or whatever knob you use.

I realize .com/.net may be using anycast as well, but they've
managed to engineer a solution that is stable.

I don't think gtld-servers.net uses anycast; someone correct me
if I'm wrong. F-root != gtld-servers.net.

Eddy

bill3 · July 4, 2004, 1:40pm

I don't think gtld-servers.net uses anycast; someone correct me
if I'm wrong. F-root != gtld-servers.net.

perhaps on two counts...

  ) the gtld-servers.net machines are anycast.
  ) F is not unique, they are just a whole lot more vocal
    about their anycasting.

Eddy

--bill

E.B_Dreger · July 4, 2004, 4:14pm

Date: Sun, 4 Jul 2004 13:40:56 +0000
From: bmanning@vac...

  perhaps on two counts...

  ) the gtld-servers.net machines are anycast.
  ) F is not unique, they are just a whole lot more vocal
    about their anycasting.

You're not the only one to correct me and say gtld _is_ anycast.

How many of the roots are? I thought there was one besides F,
but didn't think it was that many...

Eddy

Rob_Payne · July 4, 2004, 4:31pm

At least C, I, J, K, M (http://www.root-servers.org/)

-rob

bill3 · July 4, 2004, 6:55pm

there are others as well...

--bill

_Bill_Woodcock · July 5, 2004, 1:46am

> How many of the roots are? I thought there was one besides F,

    > > but didn't think it was that many...
    >
    > At least C, I, J, K, M (http://www.root-servers.org/)

And G, I believe. That's at least eight of the thirteen.

-Bill