BGP Filtering

Hi,

Considering:

http://thyme.apnic.net

Total number of prefixes smaller than registry allocations: 113220
!!!!!

/20:17046 /21:16106 /22:20178 /23:21229 /24:126450

That is saying to me that a significant number of these smaller prefixes
are due to de-aggregation of PA and not PI announcements.

My question is - how can I construct a filter / route map that will
filter out any more specific prefixes where a less specific one exists
in the BGP table.

If my above conclusion is correct a significant portion ~47% of the
number of the prefixes in the table could be argued to be very
unnecessary at one level or another.

Is such a filter possible easily or would it have to be explicitly
declared, any chance of a process the automatically tracks and publishes
a list of offending specifics similar to Team Cymru's Bogon BGP feed.

As a transit consumer - why would I want to carry all this cr*p in my
routing table, I would still be getting a BGP route to the larger prefix
anyway - let my transit feeds sort out which route they use & traffic
engineering.

Thoughts anyone?

Kind Regards

Ben

Well, you could always just take "Customer" routes from
each of your providers (since you're running BGP I presume
you're actually multihomed and not adding to the pollution)
and point default at one/both providers for the other networks
(or take default from one or both of them).

  - jared

Hi,

Default wont work - I do care about my transit providers network
becoming partitioned or IXPs having problems or fiber cuts etc etc

So I need my router to see all the reachability of a prefix in BGP so
that my router knows which transit to send it to.

Defaults wont work because a routing decision has to be made, my transit
originating a default or me pointing a default at them does not
guarantee the reachability of all prefixes..

But if I can see the /19 in the table, do I care about a load of /24s
because the whole of the /19 should be reachable as the origin AS is
announcing it somewhere in their network and it is being received my a
transit so should be reachable.

Ok, I can dream up a few emergencies where it might be helpful to pin a
/24 as well as the /19 - but I am sure there aren't 100K+ emergencies
happening continuously in the route table and it is on the whole general
whatever because there is no incentive to stop de-aggregating once you
have started.

If they are only announcing the de-aggregated /24s and no summary /19
then my question doesn't apply as I only want to drop the more specifics
where a less specific exists.

I am struggling to see a defensible position for why just shy of 50% of
all routes appears to be mostly comprised of de-aggregated routes when
aggregation is one of the aims RIRs make the LIRs strive to achieve. If
we cant clean the mess up because there is no incentive than cant I
simply ignore the duplicates.

Regards

Ben

Hi Jason,

Fantastic news, it is possible. We are using Cisco - would you be so
kind as to give me a clue into which bit of Cisco's website you would
like me to read as I have already read the bits I suspected might tell
me how to do this but have guessed wrong / the documentation hasn't
helped - so a handy pointer would be appreciated.

Kind Regards

Ben

Ben,
Look here. They show an example of prefix filtering on the 128.0.0.0/8
network. I would assume you could extrapolate and come up with your own
rule.

http://www.cisco.com/en/US/docs/ios/12_0/np1/configuration/guide/1cbgp.h
tml#wp7487

Mike Walter, MCP
Systems Administrator
3z.net a PCD Company
http://www.3z.net

Defaults wont work because a routing decision has to be made, my transit
originating a default or me pointing a default at them does not
guarantee the reachability of all prefixes..

Taking a table that won't fit in RAM similarly won't guarantee reachability of anything :slight_smile:

Filter on assignment boundaries and supplement with a default. That ought to mean that you have a reasonable shot at surviving de-peering/partitioning events, and the defaults will pick up the slack in the event that you don't.

For extra credit, supplement with a bunch of null routes for bogons so packets with bogon destination addresses don't leave your network, and maybe make exceptions for "golden prefixes".

I am struggling to see a defensible position for why just shy of 50% of
all routes appears to be mostly comprised of de-aggregated routes when
aggregation is one of the aims RIRs make the LIRs strive to achieve. If
we cant clean the mess up because there is no incentive than cant I
simply ignore the duplicates.

You can search the archives I'm sure for more detailed discussion of this. However, you can't necessarily always attribute the presence of covered prefixes to incompetence.

Joe

Hi,

I might be being slow, or you might not understand my question - I am
not sure it has been a long day.

I want a filter that will automatically match the shorter prefixes that
match any longer prefix, once I can match them I can drop them.
I don't want to manually configure a static prefix list for lots and
lots and lots of reasons.
If the longer prefix disappears from the route table I want to stop
filtering the shorter prefixes - automatically.

Hi,

Agreed that is why I have lots of RAM - doesn't mean I should carry on
upgrading my tower of babble though to make it ever higher and higher if
there is a better way of doing things.

I still don't see how a default route to a portioned pop is going to
help in the slightest - you are saved by getting the prefixes from an
alternate transit and the default doesn't get used. Where is does help
is to capture anything which has been filtered out completely and then
there is no prefix from the alternate transit provider anyway - so
whichever default gets used and takes its chances.

Bogons - obviously.

My question was if what I was asking was possible.

Kind Regards

Ben

Ben,

I think I understand what you want, and you don’t want it. If you receive a route for, say, 204.91.0.0/16, 204.91.0.0/17, and 204.91.128.0/17, you want to drop the /17s and just care about the /16. But a change in topology does not generally result in a complete update of the BGP table. Route changes result in route adds and draws, not a flood event. So if you forgot about the /17s and just kept the /16, and the /16 was subsequently withdrawn, your router would not magically remember that it had /17s to route to as well. You’d drop traffic, unless you had a default, in which case you’d just route it suboptimally.

-Dave

Ben Butler wrote:

Hi Dave,

Yes that is what I was thinking I want to do - so I am guessing here - I think what we are saying is the /17s never get re-added when the /16 is withdrawn because this does not - for very good reasons when I think about it- cause the filter to be evaluated upon the withdrawal of a prefix, only on when it is newly announced does it get checked - or maybe the odd table scan in the code?? But basically the /17s just sit there and continue to be filtered. Is that approximately correct?

so umm, yes a default would be needed, ummm.

Is it even technically possible to easily achieve though?

Ben

The /17 isn’t sitting there still being filtered; it was never there to begin with. Your router heard the /17, saw that it didn’t want it because of your filter settings, and promptly forgot it. You can tell your router to remember routes it doesn’t install; it’s called soft reconfiguration on a Cisco and is the normal mode of operation for a Juniper. But if you do that, you’re not saving memory; an inactive route does not take less RAM than an active one.

I am pretty sure that there isn’t a way to match a route on whether a larger aggregate exists using the current route map/policy statement verbage on the routers I have worked with. Doing so would be a reasonably simple code tweak, but without a purpose it isn’t a tweak you’re going to see any time soon.

-Dave

Ben Butler wrote:

This was talked about / requested several months ago on cisco-nsp. IIRC,
the thread ended along the lines of don't hold your breath.
Implementation of this sort of feature is very icky (lots of details you
may not be considering) and why should cisco spend time writing this code
when they can sell you a bigger router instead?

If the filter has to remember routes that are filtered so they can be automatically unfiltered if their covering prefix is withdrawn, then where's your savings? You can't have tea and no tea simultaneously. You want to filter routes, but keep them around (and extra pointers connecting their covering prefixes to them) in case they're needed in the future...sort of like partial soft-reconfig. On a platform like the 6500 where you may have surplus RAM but limited TCAM, that could work...on the software routers where RAM is the limiting factor it's not going to help.

Dave,

That's half-true.

The "routing table" is comprised of two components: the Routing
Information Base (RIB) and the Forwarding Information Base (FIB). The
RIB sits in slow, cheap memory and contains routes and metrics for
every route as announced by every neighbor. The FIB sits in fast,
expensive memory and contains the currently "best" route for each
destination. The FIB is built by choosing the best routes from the
RIB. Packet-forwarding decisions are made by consulting the FIB.

Opportunistically filtering routes from the RIB would have exactly the
problem you point out: routing updates are incremental. The knowledge
that the /16 has been withdrawn may not accompany the knowledge that
the /17s are available.

Opportunistically filtering more-specific routes from the FIB,
however, could be very valuable at the edge of the DFZ. If Cisco
supported such filtering, those Sup2's could have another few years of
life left in them. With 512m ram in a two-transit provider scenario a
Sup2 could handle upwards of 1M routes in the RIB. Unfortunately, they
can only handle 244k routes in the FIB.

Ben, coming back to your question: I don't think there is a way to
make the software filter the routes inserted into the FIB. I don't see
a reason why it couldn't be programmed to do that. But the fine folks
at Cisco didn't see fit to write that software. Its a pity 'cause it
would be very useful.

The next best thing you can do is statically filter /8's from distant
regions. You're posting to NANOG, so I assume that the RIPE and APNIC
regions are distant for you. Go to IANA's web site and download the
list of /8's assigned exclusively to each of those registries. For
each, create a set of /8 static routes towards each of your transit
providers with a route target address picked from an address block
that will disappear or become distant if your link to that transit
provider is severed. Then use prefix lists to filter more specific
routes within those /8's.

That should give you a result that's almost as good as if you carried
all the routes while cutting a bunch of routes from your table.

Regards,
Bill Herrin

William Herrin wrote:

  
   I think I understand what you want, and you don't want it.  If you
receive a route for, say, 204.91.0.0/16,  204.91.0.0/17, and
204.91.128.0/17, you want to drop the /17s and just care about the /16.  But
a change in topology does not generally result in a complete update of the
BGP table.  Route changes result in route adds and draws, not a flood event.
So if you forgot about the /17s and just kept the /16, and the /16 was
subsequently withdrawn, your router would not magically remember that it had
/17s to route to as well.
    

Dave,

That's half-true.
  

[discussion of FIB vs RIB deleted]

But, as you said yourself:

Ben, coming back to your question: I don't think there is a way to
make the software filter the routes inserted into the FIB. I don't see
a reason why it couldn't be programmed to do that. But the fine folks
at Cisco didn't see fit to write that software. Its a pity 'cause it
would be very useful.
  

Ergo, why I didn’t discuss the FIB in my email. If you want to filter routes, you generally have to filter them at the RIB.

How you move data from the RIB to the FIB is one of those questions that keep router engineers up all night. The transfer must be fast, reliable, and cheap on the CPU. Often, this means keeping logic out of it. A paradigm is decided upon early, and if it takes ten years to actually come back to haunt them, they haven’t done too badly. Fixing something that far down in the nuts and bolts isn’t easy. (I am not saying the presence of a revenue-generating hardware fix doesn’t influence the decision not to make a risky change to the software; I’m just saying there’s a lot of grey area to play in.)

-Dave

But if I can see the /19 in the table, do I care about a load of /24s
because the whole of the /19 should be reachable as the origin AS is
announcing it somewhere in their network and it is being received my a
transit so should be reachable.

The "presumption" in cases like this is that the /24 may take a different path than the /19 in some or all cases. If you have only a single provider you can safely dump more specifics -- but then, you could just point default. If you *are* multihomed and the /19 and /24 both have the same primacy (first choice in a routing decision and same path) you can safely drop the more specific.

The "presumption" is that in some cases the /24 would take a different path than the /19 in a routing fight.

How much cost you want to incur for these is your choice. If enough people drop the more specifics, they will go away as well -- if they provided no benefit, fewer would exist.

Some of this originates from the peering-contests where folks have "x number of prefixes" which makes them bigger than "y number of prefixes".

I'd be interested to see any metrics on rate of growth of allocations longer than RIR limits since Verio instituted then dropped mandatory prefix filters. (vs the rate of growth of prefixes overall). I would guess that they accelerated.

Deepak

Hi,

It is late and am just checking email. But...

The /24 is more specific than the /19 therefore the /24 take priority.

In my opinion AS path length became somewhat redundant with the rise of
confederations and BGP doesn't understand bandwidth, latency and
congestion. But I didn't write it, I am not that clever and it works
and is what we have today.

But.... I don't care about the remote de-aggregating AS's local traffic
engineering, I care about the reach ability of the IP my customer has
requested, and the /19 is a valid route in the route table the origin AS
put it there and it is in my local transit feed. Why should I pay in my
router for the degaregated AS's traffic engineering which doesnt benefit
me, I care about my transit and peers as long as the /19 is reachable.
Personally it is the deagregating ASs problem if they have poor transit
and peering not mine, maybe if they took ownership of their problem
rather than trying to make it everyone else's problem we would not find
ourselves in the mess we are currently in with no sign of the problem
diminishing or fixing itself.

This is not about my router or processor - it is fine thank you with
plenty of capacity transits and peers - but that doesn't excuse the
generation of dross in the table - I refuse to believe there are
justifiable reasons for anywhere near the majority of those 100K+
suspect routes. As a wide general rough rule of thumb, more specifics
(if any) for peering should only be getting announced to peers +
customers not back up into transit providers. RIPE RIR rules don't
deagreagte - period - these ASs should not expect others to carry their
extra x prefixes just because they want to stretch the size of their
table in a router waiving contest.

I know I can dump them, for identical origination ASes, and things will
continue to work for me - the trick and my question is how to
dynamically classify them so that it is possible to think about dropping
them. The question was how? The answer is - seems it cant be done.
The main/best I have heard work around seems to be RIR minimum
allocation PA space filtering plus defaults to capture the very small
number of unique prefixes of PA less than minimum allocation size that
would get filtered - as I understand it, it is top of my reading list on
my desk tommorow.

The idea as much as possible is to go with what is in the routing table
not to pin default routes all over the place and to simply try and
"easily with minimum maintenance" drop a slice of the dross without
impacting customer experience.

Thank you to all who suggested solutions.

Ben

Jon, didn't you start:

http://www.wibble.co.uk/archives/nanog/2007/msg05265.html

and Ben, is this sort of what you are looking for? Or would it
accomplish the same thing for you?

-Chris

Jon, didn't you start:

http://www.wibble.co.uk/archives/nanog/2007/msg05265.html

Yep.

and Ben, is this sort of what you are looking for? Or would it
accomplish the same thing for you?

I don't think it's at all what Ben "wants", but I think it's the closest thing to it that's actually available, relatively simple to configure, and accomplishes the desired savings.

For anyone in need of such savings, I recommned you start with one RIR at a time and carry a default route, because you're going to lose some networks. If you want to be somewhat charitable, bump the limits up 1-bit and filter on RIR minimums + 1.

Hi Folks,

  1. UK: UKNOF; http://www.uknof.org.uk/ I just attended the last meeting Monday. Free and a good lunch included!
    Please do not confuse UKNOF with the United Kingdom Nitric Oxide Forum. Nitric Oxide keeps your arteries relaxed and your blood pressure under control

  2. Europe: RIPE; http://www.ripe.net/ The Big Meeting is in Berlin in early May.

  3. France: FRnOG; http://www.frnog.org/ Has several meetings each year. Has interesting discussions in French on its mailing list. Moderator makes Stalin look easy going.

  4. UK: LINX; https://www.linx.net/ Has four meetings each year. Not difficult to get invited if you are not a member.

  5. LAMBDANET hosts several German ISP meetings; http://www.lambdanet.net/index.php?p=92&l=1&sid=ee8bc11d266a13bffdcd59ceb45c329d. Language is German.
    Please do not confuse with the Intranet for the Brothers of Lambda Theta Phi, Latin Fraternity Inc.

  6. I am not aware of any Dutch per se ISP conferences although that market is certainly quite vibrant. I am also disappointed to see the Canadians and Irish have next to nothing despite Ireland being the European base of operations for Google, Microsoft, Amazon, and Yahoo. And Canada has over 30 million people. Where is the National Pride?

  7. It is worthing mentioning that DEC-IX has started the practice of hosting carrier meetings a la Telx. These are not conferences with lectures, but networking events where each provider has a booth where they can push their products and services. Tends to be more carrier than ISP, but as you know the union of these two sets is not the null set. Quite a bit of overlap.

  8. Both DEC-IX and AMS-IX have member meetings each year. Not clear how difficult to get invited if you are not a member.

  9. I believe there are some Northern England ISP meetings. Probably MANAP.

Roderick S. Beck
Director of European Sales
Hibernia Atlantic
1, Passage du Chantier, 75012 Paris
http://www.hiberniaatlantic.com
Wireless: 1-212-444-8829.
Landline: 33-1-4346-3209.
French Wireless: 33-6-14-33-48-97.
AOL Messenger: GlobalBandwidth
rod.beck@hiberniaatlantic.com
rodbeck@erols.com
``Unthinking respect for authority is the greatest enemy of truth.’’ Albert Einstein.

[...]

APRICOT - http://www.apricot2008.net next month in Taipei.
SANOG - www.sanog.org - going on right now in Dhaka, Bangladesh