BGP Path Attribute Filtering, YES or NO?

Would like to gather current views of a wider community on BGP Path Attribute Filtering (discarding selected attributes in particular, not treat as withdraw) as an addition to the long list of standard conditioning tools like max as-path length limit, limiting number of communities all the way to running iBGP infrastructure to carry Internet prefixes separate to the one carrying customers’ L3/L2VPN prefixes.

And I appreciate the topic is somewhat contentious and there’s no simple yes or no answer either.

My view is that in a stub AS there should be no harm in discarding unused BGP path attributes,

On transit AS-es I’d expect two opposing views:

One might be: “I have a business to run and don’t care about some university experiments, so unless any of my customers specifically asks for some attribute I’ll drop all reserved, unassigned and deprecated ones and might even drop some not widely used ones just to be on the well-trodden bug free path”

Other might be: “These experimental work is of great value to the community and there’s a process now to announce and manage these experiments, what about net neutrality, and besides modern BGP implementations should handle well formatted attributes and if it’s not the case its good that these flaws are being exposed and fixed.”

Please let me know your thoughts.

adam

From our side, on peering links, re-write all MED to 0 and scrubs all communities, and replace them with our own. On customer links, we re-write MED to 0. While we don’t scrub our customer’s specific communities, we do ensure they cannot use our own, unauthorized internal communities beyond what we’ve allowed them to. Mark.

This is my position. Unfortunately it's a pipe dream, as you only need
very few to think filtering is needed to ruin the utility.

Some specific examples

- don't clean up communities which don't belong to you (
- don't clean up TOS byte (I may want to communicate QoS over internet
between my islands)
- don't clean up BGP attributes (128 would have utility if it transit,
but due to old issues, it often does not)
- don't drop ICMP (ICMP TS would be high utility if not filtered)

I think we need specific good reason to mangle/filter and if you
cannot come up with one, don't do it. If you can come up with one,
consider if it's persistent or workaround to deal with specific active
defect.

If you rewrite MED, you SHOULD rewrite origin (which RFC prohibits,
incorrectly). I can understand rationale for rewriting MED, you don't
want to cold potato, which is fair and certainly cannot be argued to
be objectively wrong, since there may be a market where people will
abuse your cold potato to save on their own infrastructure costs.

If you rewrite MED but not origin, then you're not really
accomplishing anything.

Hmmh, now I'm curious... please explain why rewriting MED but not ORIGIN
doesn't help.

Mark.

If you reset MED in effort to stop me from transferring my
infrastructure costs to your network, I can still set origin and force
cold potato in your network.

[ snip ]

I get that you'd want to reset MED on peering sessions, but any particular
rationale on why you'd rewrite MED to 0 on customer sessions?

I would argue that providing the ability for customers to transfer backhaul
costs onto their transit provider is one of the compelling commercial reasons
*for* IP transit vs. other modes of IP interconnection.

Conversely speaking, I would also argue that transit provider *should* forward
meaningful MED values on its route advertisements to customers. If a customer
wants to cold potato his outbound traffic on his own network, that's entirely
his call; he has the option of rewriting MED to 0 if he wants closest exit
to his transit instead.

Most transit providers (at least in US, I can't imagine it's much different
in EU) will permit downstream customers to cold potato traffic through their
network.

James

Okay, I see how this could be abused in a scenario where you have
multiple peering locations with a single network.

Looking at our own network specifically, I'm not immediately seeing a
risk of this (will look a little deeper in the next couple of hours) as
the only location where we may be peering in multiple locations with the
same network across a vast geographic scope is Europe. Specifically,
AMS-IX, DE-CIX, ECIX, France-IX, LINX and NL-IX. Considering the scope
of our European backbone, I don't see any obvious benefit to a
multi-homed peer given the rather contained latency between these
cities, and the average cost of capacity on the continent.

We specifically refuse to have multiple transit locations with
upstreams, partly because of this and also because we already have a
transit compliment that works well for us. So for every transit provider
we have, that would be in a single location.

The other location where we peer with the same provider in multiple
locations is South Africa (Johannesburg, Cape Town, Durban). Considering
that we have a Selective peering policy, we can manage any issues here
that may crop up, and they can come up very quickly and noticeably
considering South Africa has 2 major exit points, west via Cape Town and
east via Kwazulu Natal. I could see where a peer may decide to
cold-potato us between the 3 major cities, on-land, but there are
"social" reasons why this is not likely (buy me a beer, hehe), apart
from being quickly noticeable by our NOC and customers (including their
NOC and customers).

I can see how one transit provider could use the ORIGIN attribute to
force my network to send more traffic toward them vs. my other transit
providers. However, that would require that at least one or more of the
other transit providers set their ORIGIN code to EGP or Incomplete, so
that the default of IGP works toward their objective. Otherwise, setting
anything other than IGP, when the rest leave it default, actually
increases their potential to lose my traffic.

For customers trying to do this, I'm not sure I really care since they
are paying us for any and all ports they have with us, if multi-homed to us.

I guess I'm just battling with my mind as to whether going back to
retrofit the network with "set origin igp" explicitly is worth it. For
the moment, not yet, in our specific case (might be a different case if
we had a large North American network), but I'll keep chewing on it.

Mark.

We provide customers with a ton of LOCAL_PREF options they can activate
in our network via communities:

http://as37100.net/?bgp

As I mentioned to Saku re: the ORIGIN attribute, I don't mind customers
using this on us since we have sufficient backbone capacity in all
markets, and they pay us to provide them with a port in each market. So
if customers want to change our LOCAL_PREF values in order to push
traffic some way or another, we are okay with this, since it's $$.

Mark.

I see. LOCAL_PREF and RFC 1998 style of community attributes however are
not the right tool for signalling exit locations -- it does not scale.
Sure, it's a useful hammer to hard enforce a baseline mode of preference
on given route (e.g. route of last resort, backup or equalize to same
baseline level as peer-learned routes, etc), but for signalling optimal
exit locations at scale, MED is exactly the right tool for that job (and
networks would typically derive MED values using IGP metrics).

I'm not concerned about ORIGIN attr, as that's abuse of interconnection, so
slightly a different situation. But, denying the ability for customers
who have ports at multiple locations to use MED isn't very ideal.

James

From: Saku Ytti <saku@ytti.fi>
Sent: Wednesday, January 8, 2020 1:09 PM

> Other might be: “These experimental work is of great value to the
community and there’s a process now to announce and manage these
experiments, what about net neutrality, and besides modern BGP
implementations should handle well formatted attributes and if it’s not the
case its good that these flaws are being exposed and fixed.”

This is my position. Unfortunately it's a pipe dream, as you only need very
few to think filtering is needed to ruin the utility.

In an ideal world that would be my position too, but I suppose it depends on the context,
Imagine:
CTO: Could you have prevented this major financial and market loss and damage to our reputation resulting from this major network outage, is this something that never happened before and couldn't be foreseen?
Me: Nah happened already and sure I could have simply dropped the offending BGP attribute 254 in this case since it's not used anyways.
CTO: What the ..., why haven’t you do so then?!?!?
Me: Well because "this experimental work is of great value to the community and there’s a process now to announce and manage these experiments, what about net neutrality, and besides modern BGP implementations should handle well formatted attributes and if it’s not the case it's good that these flaws are being exposed and fixed".
CTO: You mean exposed like this? Like breaking my network?!?!? Get the ... out of here you're fired!!!!

Some specific examples

- don't clean up communities which don't belong to you (

Agreed, whatever you do only condition communities with your AS# on them.

- don't clean up TOS byte (I may want to communicate QoS over internet
between my islands)

Agreed, will dump it all into best-effort or scavenger class in my MPLS backbone anyways, along with all the SD-WAN super high priority stuff...
Falls into do not touch transit traffic unless under DoS.

- don't clean up BGP attributes (128 would have utility if it transit, but due to
old issues, it often does not)

Looking at https://www.iana.org/assignments/bgp-parameters/bgp-parameters.xhtml (don't know how well maintained it is actually), I see 128 as assigned to ATTR_SET [RFC6368], so if I filtered only Unassigned, Deprecated and Reserved from that list that shouldn't do any harm right?

- don't drop ICMP (ICMP TS would be high utility if not filtered)

So obvious that you shouldn't even mention it, but again falls into do not touch transit traffic unless under DoS,
Traffic (ICMP included) destined to your infrastructure -well that's subject to the iACL policies.

I think we need specific good reason to mangle/filter and if you cannot come
up with one, don't do it. If you can come up with one, consider if it's
persistent or workaround to deal with specific active defect.

Well the BGP attribute induced outage has a precedence and had quite a positive fallout in terms of BGP enhanced error handling etc...
My pipedream is to have the time to shoot random stuff at BGP to see what happens and then report back to vendors about my findings,...
But no, this is not part of our software certification test suite.

adam

Two solutions, two methods, same result, IMHO.

It's been scaling very well for us, and offers customers explicit
control that comes with a flip-switch cover over the, well, switch :-).

If you know of any reason why LOCAL_PREF doesn't scale, I'd like to hear
it, since I'd imagine that if closely maintaining exit paths is
important to you, you don't want to leave it to chance anyway.

Mark.