mpls switches

Alain_Hebert · April 29, 2016, 12:17pm

While following that Arista chat... That reminded me of that little
afternoon project years ago.

So I decided to find new hamsters, fire up that VM, refresh the DB's and
from the view point of a tiny 7206VXR/G1 with 2 T3 peers...

The amount of superfluous subnet advertisement drop to ~120k from
~166k from the previous snapshot.

And this is the distribution by country.

country | superfluous

Nick_Hilliard3 · April 29, 2016, 12:48pm

Alain Hebert wrote:

PS: "Superfluous" is a nice way to say that the best path of a
subnet is the same as his supernet.

... from the point of view of the paths that you see, which is to say
two egress paths. Someone else on the internet may have a different set
of bgp views which will give a different set of results for the bgp
decision process. The more paths you receive from different sources,
the more likely it is that this list of 120k "superfluous" prefixes will
converge towards zero.

You're right that it's often not necessary to accept all paths, and your
fib view can optimised in a way that your rib shouldn't be. All these
things can be used to drop the forwarding lookup engine resource
requirements, although it is important to understand that there is no
such thing as a free lunch and if you do this, there might well be edge
cases which could cause your optimisation to fail and things to blow up
horribly in your face. Still, it's an interesting thing to examine.

Nick

Laszlo_Hanyecz · April 29, 2016, 1:05pm

What Nick said is basically what I was asking about in the Arista thread. Are there new edge cases and new failure modes that are introduced by this strategy? It seems like you'd have to recompute the minimal set of forwarding rules each time a prefix is added or removed, and a single update may cause you to have to do many adds/removes to bring your compressed rules into sync, like when a hole is punched in an aggregated prefix.

I'm curious about specific failure modes that can result from this, if anyone can share examples/experience with it.

Thanks,
Laszlo

Nick_Hilliard3 · April 29, 2016, 1:30pm

Laszlo Hanyecz wrote:

I'm curious about specific failure modes that can result from this, if
anyone can share examples/experience with it.

The canonical pathological case is where the deaggregated prefixes are
affected by upstream topology changes and suddenly your optimisations
which saved you N% of forwarding lookup table capacity are wiped out to
zero and you end up with no ability to look up next-hops.

Nick

Ryan_Woolley · April 29, 2016, 4:00pm

Just to be clear, this isn't (to my knowledge) something that Arista is
doing and so the risk described doesn't affect the products that were
discussed on that thread.

Baldur_Norddahl · April 29, 2016, 7:19pm

With two uplinks that is highly unlikely to the point of being impossible.
There is no topology change upstream that can cause a situation where it is
not possible to do a high degree of aggregation of the full default free
routing table before loading it in the FIB.

Regards

Baldur

Nick_Hilliard3 · April 29, 2016, 8:25pm

Baldur Norddahl wrote:

With two uplinks that is highly unlikely to the point of being impossible.
There is no topology change upstream that can cause a situation where it is
not possible to do a high degree of aggregation of the full default free
routing table before loading it in the FIB.

which is why I qualified this in a previous posting:

The more paths you receive from different sources, the more likely it
is that this list of 120k "superfluous" prefixes will converge
towards zero.

Agreed that small numbers of paths are most unlikely to create the
conditions for this problem to occur.

Nick

Saku_Ytti1 · April 30, 2016, 3:12pm

If these compression schemes are implemented, and our compressed count
is near the limit of hardware, it creates interesting new attack
vector for attackers. Pump carefully crafted updated to global table
and watch networks melt.

I think compression makes more sense in controlled environments, but
controlled environments with large scale are likely to be exact
matches (i.e. bunch of host routes) not LPM anyhow. I'm not optimistic
about the technology.

Baldur_Norddahl · May 3, 2016, 7:55pm

I agree that a larger number of peers makes the situation more complicated.
It might warrant more studies.

Your thesis is that there might be a problem, but mine is there likely is
not. Let me argue why.

We can consider networks of various sizes:

1) the dual homed network with full tables
2) the lightly peered ISP with more than two full tables
3) the well peered ISP
4) tier 1 backbone provider

Each of those might experience different gain from the proposal and indeed
it is likely that the backbone provider would not be interested in the
solution no matter what. Even so the proposal could help deliver
considerable cheaper hardware solutions to say #1 and #2 class providers.

We already agree that the #1 class provider will not see an external event
that can explode the number of needed FIB entries after compression.

The #2 class provider is not much different. The number of routes he takes
in as peering routes as opposed to transit are few. If he runs his network
with proper max routes on every BGP session, there is nothing a free peer
can do to wreck havok. Any entity with say max routes 50 can only break up
a max of 50 of your optimized FIB entries and while that can cascade such a
/16 breaks into a series of /17, /18, /19, ..., /24 that will never add up
to anything that is a problem. In any case the real problem here will be a
rogue peer injecting fake routes into your network.

Can the more than two transit providers with full tables become a problem?
No not really. These guys are all sending mostly the same routes to you and
anything large happening will be reflected on all your transits.

There is also the point about the weekly routing report:

BGP routing table entries examined: 593320
    Prefixes after maximum aggregation (per Origin AS): 217357
    Deaggregation factor: 2.73
    Unique aggregates announced (without unneeded subnets): 290159

Now can you really say any one entity has the power to magically make all
that aggregation disappear just so he can crash your network? I will put
that in the "impossible" and "the net already crashed long before that"
categories.

There is a trend that some network are deaggregating their prefixes. Why
not use software to aggregate that right back to what it ought to be before
loading the routes into FIB? According to the above stat, that would save
at least half the FIB memory and make some routers able to handle full
tables for very much longer (possible forever).

Regards,

Baldur