Partial vs Full tables

I have been doing a lot of research recently on operating networks with partial tables and a default to the rest of the world. Seems like an easy enough approach for regional networks where you have maybe only 1 upstream transit and some peering.

I come to NANOG to get feedback from others who may be doing this. We have 3 upstream transit providers and PNI and public peers in 2 locations. It’d obviously be easy to transition to doing partial routes for just the peers, etc, but I’m not sure where to draw the line on the transit providers. I’ve thought of straight preferencing one over another. I’ve thought of using BGP filtering and community magic to basically allow Transit AS + 1 additional AS (Transit direct customer) as specific routes, with summarization to default for the rest. I’m sure there are other thoughts that I haven’t had about this as well…

And before I get asked why not just run full tables, I’m looking at regional approaches to being able to use smaller, less powerful routers (or even layer3 switches) to run some areas of the network where we can benefit from summarization and full tables are really overkill.

I have been doing a lot of research recently on operating networks with partial tables and a default to the rest of the world. Seems like an easy enough approach for regional networks where you have maybe only 1 upstream transit and some peering.

I come to NANOG to get feedback from others who may be doing this. We have 3 upstream transit providers and PNI and public peers in 2 locations. It’d obviously be easy to transition to doing partial routes for just the peers, etc, but I’m not sure where to draw the line on the transit providers.

Why draw a line? Just take their directly connected routes + default. If you don’t like traffic mix, filter or play with local pref until you are happy.

I’ve thought of straight preferencing one over another. I’ve thought of using BGP filtering and community magic to basically allow Transit AS + 1 additional AS (Transit direct customer) as specific routes, with summarization to default for the rest. I’m sure there are other thoughts that I haven’t had about this as well…

And before I get asked why not just run full tables, I’m looking at regional approaches to being able to use smaller, less powerful routers (or even layer3 switches) to run some areas of the network where we can benefit from summarization and full tables are really overkill.

It is smart approach and used by many. I would just be sure your ACL / policing needs are met too.

* James Breeden

I come to NANOG to get feedback from others who may be doing this. We
have 3 upstream transit providers and PNI and public peers in 2
locations. It'd obviously be easy to transition to doing partial
routes for just the peers, etc, but I'm not sure where to draw the
line on the transit providers. I've thought of straight preferencing
one over another. I've thought of using BGP filtering and community
magic to basically allow Transit AS + 1 additional AS (Transit direct
customer) as specific routes, with summarization to default for the
rest. I'm sure there are other thoughts that I haven't had about this
as well....

We started taking defaults from our transits and filtering most of the
DFZ over three years ago. No regrets, it's one of the best decisions we
ever made. Vastly reduced both convergence time and CapEx.

Transit providers worth their salt typically include BGP communities
you can use to selectively accept more-specific routes that you are
interested in. You could, for example, accept routes learned by your
transits from IX-es in in your geographic vicinity.

Here's a PoC where we used communities to filter out all routes except
for any routes learned by our primary transit provider anywhere in
Scandinavia, while using defaults for everything else:

(Note that we went away from the RIB->FIB filtering approach described
in the post, what we have in production is traditional filtering on the
BGP sessions.)

Tore

Is this verbatim? I don't think there is a use case to ever carry
default route in dynamic routing.

In eBGP it should be some reliable indicator of operator network being
up, like their own aggregate route, they have incentive to originate
this correctly, as it affects their own services and products. So
recurse static default to this route. Otherwise you cannot know how
the operator originates default, they may just blindly generate it in
the edge, and if edge becomes disconnected from core, you'll
blackhole, compared to static route solution where the aggregate would
not be generated by edge routers by any sane operator due to
self-preservation instinct, you'd be able to converge instead of
blackhole.

In internal network, instead of having a default route in iBGP or IGP,
you should have the same loopback address in every full DFZ router and
advertise that loopback in IGP. Then non fullDFZ routers should static
route default to that loopback, always reaching IGP closest full DFZ
router.

* Saku Ytti

> We started taking defaults from our transits and filtering most of the
> DFZ over three years ago. No regrets, it's one of the best decisions we
> ever made. Vastly reduced both convergence time and CapEx.

Is this verbatim?

I do not understand this question, sorry.

you cannot know how the operator originates default

Sure you can, you just ask them. (We did.)

Tore

And is it the same now? Some Ytti didn't 'fix' the config last night?
Or NOS change which doesn't do conditional routes? Or they
misunderstood their implementation and it doesn't actually work like
they think it does. I personally always design my reliance to other
people's clue to be as little as operationally feasible.

* Saku Ytti

> Sure you can, you just ask them. (We did.)

And is it the same now? Some Ytti didn't 'fix' the config last night?
Or NOS change which doesn't do conditional routes? Or they
misunderstood their implementation and it doesn't actually work like
they think it does. I personally always design my reliance to other
people's clue to be as little as operationally feasible.

The way they answered the question showed that they had already
considered this particular failure case and engineered their
implementation accordingly. That is good enough for us.

Incorrect origination of a default route is, after all, just one of the
essentially infinite ways our transit providers can screw up our
services. Therefore it would make no sense to me to entrust the
delivery of our business critical packets to a transit provider, yet at
the same time not trust them to originate a default route reliably.

If we did not feel I could trust my transit provider, we would simply
find another one. There are plenty to choose from.

Tore

Maybe instead of transit + 1, you use communities to just allow all customer prefixes, regardless of how deep they are. Obviously that community would need to be supported by that provider.

I’ve been wondering a similar thing for how to take advantage of the 150k - 250k hardware routes the CRS317 now has in v7 beta. That many routes should cover the peering tables for most operators, maybe even transit’s customers.

Hi James,

When I was at the DNC in 2007, we considered APNIC-region /8s lower priority than ARNI region (for obvious reasons) so I got some extra life out of our router by pinning most APNIC /8s to a few stable announcements, preferring one transit to the other with a fallback static route. This worked in the short term but I wouldn’t want to do it as a long term solution.

As a more generic approach: filter distant (long AS path) routes because there’s a higher probability that they’re reachable from any transit with about the same efficiency.

Any time you summarize routes, you WILL lose connectivity during network partitions. Which defeats part of the purpose of having BGP with multiple transits. Partitions are rare but they can persist for days (cough cogent cough). So that’s a risk you should plan for.

Regards,
Bill Herrin

Agree with Mike on looking at communities first. Depending on the provider, that could be a very nice tool, or completely worthless.

For your planned idea on smaller “regional” nodes, you could do something like :“default || ( customer && specific cities/states/regions/countries )” , d

I would definitely make sure you consider what your fallback options are in case of partitions as Bill mentioned in another reply. The less routes you have to start with the harder it gets though.

Saku-

In internal network, instead of having a default route in iBGP or IGP,
you should have the same loopback address in every full DFZ router and
advertise that loopback in IGP. Then non fullDFZ routers should static
route default to that loopback, always reaching IGP closest full DFZ
router.

Just because DFZ role device can advertise loopback unconditionally in IGP doesn't mean the DFZ actually has a valid eBGP or iBGP session to another DFZ. It may be contrived but could this not be a possible way to blackhole nearby PEs..?

We currently take a full RIB and I am currently doing full FIB. I'm currently choosing to create a default aggregate for downstream default-only connectors based on something like

     from {
        protocol bgp;
        as-path-group transit-providers;
        route-filter 0.0.0.0/0 prefix-length-range /8-/10;
        route-type external;
    }

Of course there is something functionally equivalent for v6. I have time series data on the count of routes contributing to the aggregate which helps a bit with ease of mind of default being pulled when it shouldn't be. Like all tricks of this type I recognize this is susceptible to default being synthesized when it shouldn't be.

I'm considering an approach similar to Tore's blog where at some point I keep the full RIB but selectively populate the FIB. Tore, care to comment on why you decided to filter the RIB as well?

-Michael

Hey Michael,

It's a little more nuanced than that. You probably don't want to
accept a default from your transit but you may want to pin defaults
(or a set of broad routes as I did) to "representative" routes you do
accept from your transit. By "pin" I mean tell BGP that 0.0.0.0/0 is
reachable by some address inside a representative route you've picked
that is NOT the next hop. That way the default goes away if your
transit loses the representative route and the default pinned to one
of your other transits takes over.

You can craft and tune an effective solution here, but there has to be
an awful lot of money at stake before the manpower is cheaper than
just buying a better router.

Regards,
Bill Herrin

That is a great idea. Get all the utility of default with fewer risks.

* Michael Hare

I'm considering an approach similar to Tore's blog where at some
point I keep the full RIB but selectively populate the FIB. Tore,
care to comment on why you decided to filter the RIB as well?

Not «as well», «instead».

In the end I felt that running in production with the RIB and the FIB
perpetually out of sync was too much of a hack, something that I would
likely come to regret at a later point in time. That approach never
made it out of the lab.

For example, simple RIB lookups like «show route $dest» would not have
given truthful answers, which would likely have confused colleagues.

Even though we filter on the BGP sessions towards our transits, we
still all get the routes in our RIB and can look them up explicitly we
need to (e.g., in JunOS: «show route hidden $dest»).

Tore

Speak of which, did anyone ever implement FIB compression? I seem to
remember the calculations looked really favorable for the leaf node
use case (like James') where the router sits at the edge with a small
number of more or less equivalent upstream transits. The FIB is the
expensive memory. The RIB sits in the cheap part of the hardware.

Regards,
Bill Herrin

I do the above using routes to *.root-servers.net to contribute to the
aggregate 0/0.

We started filtering certain mixes of long and specific routes on transit, at least while some upgrades to our edge capability are in progress. We are a mix of transit providers, and public/private peering at our edge.

Shortly after filtering, we started occasionally finding destinations that were unreachable over the Internet (generally /24) due to:
- We filtered them on transit, probably due to long paths
- They were filtered from all of our transits, so their /24 was not in our table
- We did not receive their /24 on peering
- However, we did receive a covering prefix on peering
- Lastly, that actual destination network with the /24 no longer was connected to the network we received a covering route from, like a datacenter network that used to host them and SWIPed them their /24 to make it portable.

A 3rd party SaaS netflix platform’s BGP/netflow/SNMP collectors were impacted by this, which was one of the first instances we encountered of this problem.

We now have some convoluted scripting and routing policy in place, trying to proactively discover prefixes that may be impacted by this and then explicitly accepting that prefix or ASN on transit. It is not a desirable solution, but this seems like it could become more common over time with v4 prefix sales/swaps/deaggregation (with covering prefixes left in place); as well as increased TE where parties announce aggregates and specifics from disjoint locations.

Our long term solution will be taking full tables again.

Ryan

fre. 5. jun. 2020 20.12 skrev Ryan Rawdon <ryan@u13.net>:

Shortly after filtering, we started occasionally finding destinations that were unreachable over the Internet (generally /24) due to:

I have observed this too.

I know of no router that can do this, but you need the router to automatically accept any prefix on your transit link that is covered by anything received from your peers.

fib optimize => using LPM table for LEM
https://www.arista.com/en/um-eos/eos-section-28-11-ipv4-commands#ww1173031

FIB compression => install only 1 entry into FIB for compressable
routes with shared nexthop
https://eos.arista.com/eos-4-21-3f/fib-compression/

The feature itself works as intended. version/platform/config
compatibility needs some considerations.