RIPE "Golden Networks" Document ID - 229/210/178

Hello folks,

This is actually NANOG applicable, despite referring to RIPE... :wink:

How many of you who manage BGP speaking networks implement the RIPE "best practices" regarding dampening parameters for so-called "golden networks"?

See: http://www.ripe.net/ripe/docs/routeflap-damping.html
and
http://www.qorbit.net/documents/golden-networks (thanks, Steve!)

If you do, what parameters do you use, or do you not dampen the "golden networks" at all?

If you don't implement ripe-229, why not?

If there is enough interest/response (i.e if anyone besides me feels this is a real operational issue currently and wants to deal with it), I'll work on compiling the responses and producing a report.

Note: A *significant* number of networks appear to *not* follow ripe-229 guidelines at all.

Thanks,

Rodney Joffe
CenterGate Research Group, LLC
http://www.centergate.com
"Technology so advanced, even WE don't understand it"(R)

well....

RIPE is the RIR for Europe. RIPE-229 is, from my viewpoint, arbitrary and capricious.
the root servers are -ONE- set of interesting servers. what about the web sites that point
to these "important" documents? or the time servers, or my NOC & monitoring machines?

The idea of an Internet Registry stepping into giving routing advice is a leap of faith.
An RIR can tell you what was delegated - but presuming to give advice on what is important
for everyone that uses IP protocols is over the top.

so no, i don't use this document as a guideline for "golden networks". the advice on
dampening is important tho and it worthwhile.

No. RIPE != RIPE NCC (RIR). This document is a product of the RIPE
Routing-WG [1]. Read the reference.

Fred

[1] http://www.ripe.net/ripe/about/index.html

Bill,

  I agree with your general line of reasoning, but would likely characterize
  RIPE as an RIR *and* operator forum... formulating and reviewing
  recommendations on operational matters make some sense as a result.

  As to the particular set of prefixes, there's a great question as to what
  criteria make a particular network "important"... one could easily come
  up with a list of extremely popular commercial sites (CNN, Amazon, etc.)
  which might be more noticeable if route damped for an hour.

/John

I think the real quesiton is:

        Based on the increased performance of routers these days..
most people running BGP aren't using a 2500 or AGS+ anymore, or at least
not getting a full routing table on them.

        Is bgp dampening really necessary anymore? Obviously we should
dampen people that flap a high number of times in an hour, but the vast
majority of the internet operates in a state where dampening causes more
pain than benifit, imho.

  - jared

        Is bgp dampening really necessary anymore? Obviously we should
dampen people that flap a high number of times in an hour, but the vast
majority of the internet operates in a state where dampening causes more
pain than benifit, imho.

I agree with your line of reasoning. However, if you follow the RIPE document's guidelines [ included below for reference ]...

I don't fundamentally have a problem with any of it. 4 flaps before you start dampening in a time window is a lot of flapping. Which means you are flapping that prefix throughout your internal network views as well * the number of distributed forwarding line cards you have, etc, etc. Its not necessarily a good thing to leave unmanaged, no matter how slightly.

I don't know if everything needs to be stable for an hour when it takes 4 flaps to bring the wrath of dampening on it in the first place though.

Maybe 15-20 minutes of stability on the high end (/24 and longer prefixes). If someone flapped every 30 minutes or so, while not ideal, its certainly not causing wide-spread network failures and its keeping you from blackholing a good chunk of their traffic.

I think the idea harkens to a day when coming up with 100% of your sessions & recalcs could bring your router down as traffic started to flow. So dampening helped you and everyone else stabilize before significant amounts of traffic started flowing through the 2500, 3600, AGS or whathaveyou. Clearly this isn't really the case anymore. If your router needs to protect itself from the big-bad-bgp sessions of its more powerful upstream routers, it can dampening more aggressively.

Just my opinion,

Deepak Jain
AiNET

If you don't implement ripe-229, why not?

because the golden address space stuff is stupid

I don't fundamentally have a problem with any of it. 4 flaps before you
start dampening in a time window is a lot of flapping.

you may want to look at

http://rip.psg.com/~randy/030226.apnic-flap.pdf

randy

Maybe so, but the logic seems rather irrefutable:

- as a rule, shorter prefixes are more important and/or more stable than long ones
- so we dampen long prefixes more aggressively
- the root DNS servers tend to live in long prefixes
- so we exclude the root DNS prefixes

But then again, dampening really doesn't buy you much as it only applies to routes that are flapping beyond the link to the next AS. So if you have an instable link somewhere, you can't dampen that instability away yourself.

Hi Randy,

If you don't implement ripe-229, why not?

because the golden address space stuff is stupid

OK. I'll bite...

Given Network A, which has "golden network" content behind it as described by the RIPE paper (root and tld data), if the network has some combination of events that result in all of their announcements to you being dampened by you, your users can't get "there". For grin's, let's say we're talking about .foo, one of the larger gtld's.

You are absolutely right in suggesting that .foo has to get its act together. You may even tell your users that. But you'll be telling every single one of them, because every single one of them is going to attempt to resolve .foo domain names during the hour you have them dampened. And your cost in dealing with those support calls will probably outweigh the benefits of dampening .foo.

I am polling networks so that I can get an idea of who handles their network this way, and who doesn't. I don't know if it is stupid or not, because I don't know enough about the subject yet. What I do know is that dampening these special networks with long prefixes already causes real-world problems. In many cases, the pain is felt by networks who may have a policy of not dampening, but are downstream of a major network that *does* dampen aggressively. Unless they're looking at the routing announcement and withdrawal data and analyzing it, they may never realize why their support infrastructure was overwhelmed. And Jared has a good point - modern BFR's *can* handle lots of flaps without breaking a sweat so maybe dampening aggressively, or even at all, may be an artifact whose time has gone.

Notwithstanding the normal response of "If what is on that network is broken, let them fix it" which is tantamount to cutting off your nose to spite your face, saying it is stupid is more of a generalization and opinion, but doesn't really give reasons as to why it is stupid, so it really has no real value. What are the reasons you think (or know) it is *stupid*? And what is the solution technically, not to include "let them fix it - I'm in the right, so I'm not going to do anything".

Thanks
/rlj

because the golden address space stuff is stupid

Given Network A, which has "golden network" content behind it as
described by the RIPE paper

i don't care. if i had spare time on my hands, i would damp them
more quickly for stupidity and greed. again, golden network space
is a stupid idea. check out the dns for name to address mapping.

randy

And this is the point: dampening can actually lead to decreased network stability and non-deterministic behavior. Granted, this behavior is exasperated by not deploying a common dampening policy across all ASes (which is the why RIPE-229 was written).

This would not be as problematic if dampening could be applied to a path rather than a prefix, since an alternate could then be selected. But since this would require modifications to core aspects of BGP (and additional memory and processor requirements) it does not seem a likely solution.

I've been wondering what the net results would be if one
dampened aggressively but only for a max of 7-15 mins. Might
that allow for the networks to be properly penalized yet provide the
users a minimum amount of time to recover once the prefix is stable?

  - jared

But .foo is announced from 13 IPs globally, allowing for anycast probably 40
nodes. If gtld-A has an incident it may be a good thing to dampen it from the
internet as it may not be reachable, the other 12 gtlds will be able to serve
responses in a stable manner.

Unless you're suggesting *all* the gtlds are flapping at once?

Steve

Sorry. I thought I made that clear, in that "if the network has some combination of events that result in all of their announcements to you being dampened by you". I am not talking about events that happen all of the time, where one of 13 hiccups. .foo may have 13 IPs but they have two upstream providers, and the event causes all of their routes to flap.

Rodney Joffe
CenterGate Research Group, LLC
http://www.centergate.com
"Technology so advanced, even WE don't understand it"(R)

While I'm not going to encourage anybody to avoid doing something to make
their network stable because it should be somebody else's problem (just as
I wouldn't suggest that somebody cross the street in front of a speeding
truck just because pedestrians have the right of way at California
crosswalks), this whole discussion strikes me as something that needs to
be looked at in the context of DNS diversity.

In the case of the root servers, there are 13 IP addresses, announced from
different ASes, most of them by different organizations. Some of them are
anycasted; I believe some of them still aren't. As long as a network
still has reachability to one of them, things should work. Anything that
causes a network to see all 13 of them flapping simultaneously is probably
a local problem, and probably leaves much of the rest of the Internet
inaccessible from that network

The same really can't be said for some of the TLDs, either on the
qorbit.net Golden Networks list or off (it omits all the ccTLDs, which
include some of the most important TLDs in some parts of the world). I
suspect many of the TLDs that have only two or three listed name servers
are anycasted, and anycast does add a lot of reliability. For most forms
of network or server failure, a good anycast implementation can force
fail-over to another server, and users not doing traceroutes to the name
servers will never notice. But one thing anycast doesn't do is protect
against route flapping. If a domain is served from two anycast addresses,
and two announced routes, all it takes to make it completely unreachable
from some part of the Internet is for the two local servers to start
flapping at the same time. If reliability of the individual components is
equal, that should be a lot less robust than the root server architecture.

So, it seems to me that there are three questions here:

What is critical infrastructure? DNS for which domains? What about other
services? Google? Hotmail or Yahoo? The answer to this presumably
varies considerably from place to place.

What should the providers of critical infrastructure be doing to make sure
their critical infrastructure remains available?

What should network operators be doing to make sure their networks can
access critical infrastructure?

-Steve

Hi Steve,

Steve Gibbard wrote:

<snip>

So, it seems to me that there are three questions here:

What is critical infrastructure? DNS for which domains? What about other
services? Google? Hotmail or Yahoo? The answer to this presumably
varies considerably from place to place.

What should the providers of critical infrastructure be doing to make sure
their critical infrastructure remains available?

What should network operators be doing to make sure their networks can
access critical infrastructure?

The main question I was asking was actually:

"How many of you who manage BGP speaking networks implement the RIPE "best practices" regarding dampening parameters for so-called "golden networks"?

So, while I know your analysis and suggestions are important, this list suffers sufficiently from the "rathole" syndrome for me to respectfully move it back on subject. My primary question was quite simple, and most of the responses I have had have been just as simple and concise.

For those who care, based on responses and some analysis, it appears that very few networks do follow the ripe-229 recommendations regarding "golden networks", including, oddly enough, parts of RIPE itself.

Thanks to those who did respond.

/rlj

In article <41393B29.6010804@centergate.com>, Rodney Joffe <rjoffe@centergate.com> writes

For those who care, based on responses and some analysis, it appears that very few networks do follow the ripe-229 recommendations regarding "golden networks", including, oddly enough, parts of RIPE itself.

Did you mean "parts of RIPE-NCC"?

Sorry to be so pedantic, but this thread started off with a mild diversion caused by confusion between RIPE and RIPE-NCC.

Roland Perry wrote:

Did you mean "parts of RIPE-NCC"?

Sorry to be so pedantic, but this thread started off with a mild diversion caused by confusion between RIPE and RIPE-NCC.

You're right - it is a little confusing. According to their joined "about" pages, RIPE-NCC provides the administrative support for RIPE. So I guess it is a part of RIPE. But to answer you properly, I meants parts of RIPE, specifically including RIPE-NCC, do not follow the RIPE-NCC recommendations. :wink:

/rlj

ok so as someone else mentioned this would be a local problem. in a network such
as this, you should be concerned for the possibility of having large numbers of
prefixes dampened and soften your dampening parameters accordingly. there is
nothing special in this scenario about 'golden networks'

Steve