AS-Path - ORF Draft

Mike_Hammett · October 22, 2017, 6:35pm

https://tools.ietf.org/html/draft-ietf-idr-aspath-orf-13

Not knowing anything about the draft\RFC process (and not really wanting to go beyond a 30k foot view), is this something with movement? Traction?

This would have solved a situation I encountered a week ago.

Job_Snijders3 · October 22, 2017, 10:29pm

Hi Mike,

Mike_Hammett · October 22, 2017, 10:37pm

Network A was sending more routes into the route server than Network B could handle. Network B would like Network A's routes filtered before they even got to their router.

Googling a bit I saw pages talking about saving CPU or what have you, but the main thing was Network B has a limited FIB. They have a prefix limit specified to protect that. Their device goes through prefix limit before prefix filter, so their filters wouldn't even see the advertisements as the prefix limit already killed the session. Raise the prefix limit so that the filters can get to work and now you're vulnerable to someone else injecting a ton of routes and melting their router.

If that draft were supported by Network B's router and the route servers, I believe that Network B could tell the route servers to filter Network A's prefixes before sending them, thus saving their FIB.

Obviously the most correct answer is for Network A to get routers with big enough FIBs, but that's not always possible or practical.

Baldur_Norddahl · October 22, 2017, 10:53pm

I do not get why every BGP implementation kills the session at the prefix
limit. It appears that is making a bad situation worse. Routing flaps
creating lots of visible disturbance for end users. When the BGP session
restarts, it will just happen again and again until operator intervention.

Instead an implementation could ignore any additional prefixes or it could
compare each additional prefix received to already learned prefixes and
decide to drop one to make room for the new one. For example you could drop
the most specific routes before less specific routes.

Regards

Baldur

Mike_Hammett · October 22, 2017, 10:57pm

In my situation, if it applied the filter before the limit, everything would work fine.

Maybe the thought is the other peer has some runaway issue that you don't want to spend resources dealing with instead of grooming an otherwise normal condition? *shrugs*

_Job_Snijders · October 23, 2017, 5:36am

Network A was sending more routes into the route server than Network B
could handle. Network B would like Network A's routes filtered before
they even got to their router.

Googling a bit I saw pages talking about saving CPU or what have you,
but the main thing was Network B has a limited FIB. They have a prefix
limit specified to protect that. Their device goes through prefix
limit before prefix filter, so their filters wouldn't even see the
advertisements as the prefix limit already killed the session. Raise
the prefix limit so that the filters can get to work and now you're
vulnerable to someone else injecting a ton of routes and melting their
router.

If that draft were supported by Network B's router and the route
servers, I believe that Network B could tell the route servers to
filter Network A's prefixes before sending them, thus saving their
FIB.

Your interpretation of the functionality described in the draft is
correct. Work on this draft started in december 2000 as can be read
here: draft-keyur-bgp-aspath-orf-00. I am not
aware of any implementations, and having read the draft and observing
there are no IANA codepoint assignments yet, it is very unlikely there
are any implementations available for production use.

Generally speaking it is safe to say that 17 year old Internet-Drafts
(without known implementations) may be lacking the required traction to
become a RFC.

So alternatively, network B can tell the route server operator via email
"do not send me these prefixes", and the route server operator in the
middle honors that request and doesn't send those prefixes to network B.
Some IXP's offer a webportal for this type of functionality, other IXPs
allow signaling via RPSL in the IRR or as mentioned before, email.

Obviously the most correct answer is for Network A to get routers with
big enough FIBs, but that's not always possible or practical.

s/Network A/Network B/ - Yes, this can be a challenge. I fear that
bgp-aspath-orf won't be of any help in the short term.

Kind regards,

Job

_Job_Snijders · October 23, 2017, 6:35am

Dear Baldur,

I do not get why every BGP implementation kills the session at the
prefix limit. It appears that is making a bad situation worse. Routing
flaps creating lots of visible disturbance for end users. When the BGP
session restarts, it will just happen again and again until operator
intervention.

Maximum prefix limits are used as a naive last resort to attempt to
protect against catastrophic failures such as memory/fib overflow and
full table route leaks. The moment a maximum prefix limit kicks in,
something somewhere went wrong and indeed an operator has to intervene.
That is the beauty and essence of the maxpfx feature.

Instead an implementation could ignore any additional prefixes

This may work in some specific cases, but can be disastrous in other
cases. In my opinion, in context of Internet routing, the potential for
disaster outweighs any benefits I can see for "ignoring additional
prefixes" (in L3VPN context different considerations may apply).

You offered "killing a session may make a bad situation worse", but
there are of scenarios where keeping the session up can make a bad
situation into a diaster.

I'll elaborate on the above with an example to hopefully clarify myself.
Let's take this event and hypothetically assume 'soft maximum prefix
limits' are a commonly deployed thing.
https://bgpmon.net/bgp-leak-causing-internet-outages-in-japan-and-beyond/

According to PeeringDB AS 15169 recommends to configure 15,000 as the
maximum prefix limit for IPv4. (AS15169 - Google LLC - PeeringDB)
Let's assume that Verizon had configured "a maximum of 15,000 but keep
the BGP session up"-style of soft limit. I currently see roughly 419
prefixes via AS15169 in the DFZ. 15000 - 419 = 14581, so this leaves
room for 14581 invalid announcements before the softlimit is kicks in.
At that point I'd argue that it is better to just tear down the BGP
session rather than create a situation where 14581 invalid announcements
(which are part of a 160,000 prefix route leak) can continue to exist.

We could go back and forth a bit on how high or low that '15,000' number
should be and how things would look if it was closer to 500. But in the
end actual operator intervention was needed, and soft maxprefix limits
would have the potential to hide that.

or it could compare each additional prefix received to already learned
prefixes and decide to drop one to make room for the new one. For
example you could drop the most specific routes before less specific
routes.

The moment a BGP implementation can do such RIB compression, it may
indeed make sense to offer two types of limits: a 'pre-policy maximum
prefix limit' and a 'post-policy maximum prefix limit'. The former type
of limit would be useful in context of route leaks, the latter in
context of protecting against overflow of the FIB capability.

Kind regards,

Job

ps. RPKI Origin Validation and BGPSEC do have the potential to change
the way we look at big hammers like maximum prefix limits, but we're not
there yet.

_Job_Snijders · October 23, 2017, 10:37am

> or it could compare each additional prefix received to already learned
> prefixes and decide to drop one to make room for the new one. For
> example you could drop the most specific routes before less specific
> routes.

The moment a BGP implementation can do such RIB compression, it may
indeed make sense to offer two types of limits: a 'pre-policy maximum
prefix limit' and a 'post-policy maximum prefix limit'. The former type
of limit would be useful in context of route leaks, the latter in
context of protecting against overflow of the FIB capability.

Apparently this already exists and is widely available, Saku Ytti gave
me some additional information. There are various keywords available,
and they operate at different attachment points in the conceptual model.

ghankins_237a87 · October 23, 2017, 10:57am

Nokia SR OS defaults to pre-policy but can be configured to post-policy
by adding "post-import".

prefix-limit ipv4 100 // pre-policy
prefix-limit ipv6 100 post-import // post-policy

Greg

Mike_Hammett · October 23, 2017, 12:53pm

Should I assume that invigorating traction for a 17 year old draft is rather difficult?

It is my understanding that Network B does wish to accept Network A's prefixes elsewhere, just not here. I believe that specifying the block via IRR would be universal and probably not wanted.

Some of my fellow IX operators have advised me to avoid doing manual filtering for a variety of reasons.

Which IXes have a web portal for that? Offlist is fine. I'd like to see that and talk to them about their implementation.

_Job_Snijders · October 23, 2017, 1:24pm

Should I assume that invigorating traction for a 17 year old draft is
rather difficult?

John Heasley told me that a fundamental difficulty here is that not
every implementation uses the same style/type of regular expressions.
Unifying this behaviour across vendors will require a lot of pull.

It is my understanding that Network B does wish to accept Network A's
prefixes elsewhere, just not here. I believe that specifying the block
via IRR would be universal and probably not wanted.

You can make it IX specific by using an old proposal called 'RPSL VIA'.
Look for "The script supports most of the IETF snijders-rpsl-via draft
extensions": AMS-IX Amsterdam

Some of my fellow IX operators have advised me to avoid doing manual
filtering for a variety of reasons.

Yes, they are right. The moment the route server operator introduces
hacks like that, the affected participants may forget those hacks
existed over time.

On the flip side, if Network B can't filter out the announcements, or
insists on using a pre-policy maximum prefix limit - and Network A
refuses to add a suppression community to their announcements to the
route server (maybe because they want to cookie stamp all those
configs), what can you (as person in the middle) do?

If both network A and network B refuse to cooperate / coordinate, it
somewhat dilutes the value of the route server to participants C/D/E
because network B keeps flapping.

Which IXes have a web portal for that? Offlist is fine. I'd like to
see that and talk to them about their implementation.

I believe NL-IX (https://nl-ix.net/) and VIX (https://www.vix.at) are
example IXPs that have this. There are probably a bunch more that offer
this type of feature.

Kind regards,

Job

Mike_Hammett · October 23, 2017, 2:56pm

I was looking at using arouteserver to automate my prefix filter generation. I'll do a feature request over there.

_Job_Snijders · October 23, 2017, 3:11pm

I was looking at using arouteserver to automate my prefix filter
generation.

Excellent choice. I would happily recommend arouteserver to any internet
exchange operator looking to modernize their route servers.

I'll do a feature request over there.

Great news! You can already do that in arouteserver:
http://arouteserver.readthedocs.io/en/latest/CONFIG.html#bird-hooks

Kind regards,

Job

Mike_Hammett · October 23, 2017, 3:13pm

Great news! You can already do that in arouteserver: Configuration — ARouteServer latest documentation

If you're using Bird. We're using OpenBGPd.

_Job_Snijders · October 23, 2017, 4:08pm

I enjoy using both BIRD and OpenBGPD. Please look more closely. Look for
the string 'openbgpd' on that page. The attachment points for BIRD and
OpenBGPD are different, but arouteserver supports hooking in manual
config for both BIRD and OpenBGPD.

YYCIX, ofcourse based on OpenBGPD :-), is successfully documenting
manual overrides in a 'pre-filters' file. You'll want to do the same.

Kind regards,

Job