Route optimization using GPUs?

True, I didn’t even think of all of the upstreams of those networks being responsible for accepting bad routes.

They are responsible for writing the correct import/export policies for their network, just like the carriers for writing sane policies for customer circuits

Nobody is asserting that operators are still not responsible for doing this. Stop it.

What I think some of you guys fail to understand is that you can have perfectly appropriate policy protections in place that the prefixes generated by the optimizer can still bypass, because these devices don’t follow standard BGP behaviors. This has happened before.

Tom,

What does “these devices don’t follow standard BGP behaviors” have to do with adding a NO_EXPORT or specific community on the import policy when a route is accepted, and being belt & suspenders with matching those communities to drop those routes on export to carriers/IX/PNI sessions?

Ryan Hamel

Hi Ryan,

Is there room or use for standards work here? Just spitballing, but
something along the lines of:

A synthetic BGP route is a BGP route produced by an AS other than the
original origin of the IP addresses contained. Examples include
default routes, black-hole routes and routes produced by a route
optimizer in a middle network.

The producer of a synthetic BGP route MUST mark the route with community XXX.
Routers MUST NOT remove the synthetic community XXX from a route.
Synthetic routes learned from EBGP sessions MUST be rejected by
default. Routers MAY accept synthetic routes learned from EBGP
sessions if explicitly configured to do so.

Regards,
Bill Herrin

Bill,

While that sounds plausible on paper, keep in mind it will also include IX route servers and MPLS route reflectors.

Noction can append communities to its announcements, meaning one can very easily set NO_EXPORT (65535:65281) and/or a community applicable to the AS. The latter going back to what I said about matching a community on export policies to carriers/IX/PNIs, to drop them if the network vendor has a bug in NO_EXPORT, like https://quickview.cloudapps.cisco.com/quickview/bug/CSCuv94859.

Real life example of an optimizer issue I helped a friend sort out many years ago.

Prefix 1 : 10.0.0.0/16, Communities A B C
Prefix 2 : 10.0.0.0/20, Communities B D E

Optimizer created /23s inside the /20. What communities do you think it applied to the new routes?

  • A B C
  • B D E
  • Random assortment of 3-6 values

If you guessed random assortment, you’re a winner! The optimizer got confused with 2 source prefixes covering what it was trying to generate, and every time it re-optimized it did something different with the communities.

This is what I am talking about, that you can do everything “right” with respect to your policies and protections, and the optimizer can do something completely off book and cause an issue.

Ryan,

BGP ensures loop-free interdomain path computation by inspecting the AS path of each NLRI. If a routing optimiser rewrites all the AS paths for all the NLRIs it receives, then it's just pooped all over the primary component of BGP that's designed to ensure that interdomain BGP actually works in the way that it's supposed to do in the first place, which also acts as an intrinsic safety guard against dfz hijacking.

Removing an intrinsic safety guard like this is an inherently risky thing to do. When you elevate the inherent risk of a system, you necessarily elevate either the likelihood of failure or the consequences of a failure, or both.

As an industry, we should be well beyond the point of having to tell people that this is a poor idea, in the same way that we don't need to tell people that bypassing electrical fuse boxes is a poor idea, or removing railings on stair-cases, or not wearing motorbike helmets, or anything else designed to mitigate the unfortunate consequences of entirely predictable accidents.

Nick

Hi Nick,

Have you ever filtered routes from the BGP table and replaced them
with a default route? Perhaps the TCAM was too full and you weren't
ready to upgrade yet?

There's nothing inherently wrong with filtering BGP routes and
replacing them in local routes of your own selection. Nor is there
anything wrong with using a complicated and detailed local selection
process. The error lies in allowing those local routes to accidentally
escape your AS.

Since people being people, they make mistakes, I thought a little
standards work in the area might head off some of those escapes.

Regards,
Bill Herrin

Nick,

I understand there are rules and unspoken guidelines/rules for the DFZ, but when it comes to each individual AS, that org/operator can run their AS internally however they please, and maybe they have considered the risks you have mentioned.

That said, I can argue that upstreams not filtering their customers properly removes a safety guard, upstreams not implementing RPKI removes a safety guard, not properly prepending communities on synthetic routes to drop them on export again removes a safety guard. I can go on…

  • As an industry, we should be well beyond the point of having to tell people that this is a poor idea, in the same way that we don’t need to tell people that bypassing electrical fuse boxes is a poor idea, or removing railings on stair-cases, or not wearing motorbike helmets, or anything else designed to mitigate the unfortunate consequences of entirely predictable accidents.

Where this statement falls short is, those are all regulated by building codes, laws, etc. No laws exist dictating how BGP, routing protocols in general, and topologies must be implemented, nor what safety guidelines must be adhered to.

William,

Exactly! An example below is where operators/orgs do not have the funds for a full table router deployment and gather top talkers from sFlow, which says what routes are to be installed in TCAM, instead of hitting a default route.

https://blog.sflow.com/2015/07/sdn-router-using-merchant-silicon-top.html
https://blog.sflow.com/2015/10/active-route-manager.html
https://blog.sflow.com/2016/07/internet-router-using-merchant-silicon.html

Maybe instead of sFlow, it’s FIB compression for switches that can only handle 512K IPv4 routes, or routers that are showing their age with 1M IPv4 route capacity.

* ryan@rkhtech.org (Ryan Hamel) [Fri 06 Dec 2024, 18:46 CET]:

William,

Exactly! An example below is where operators/orgs do not have the funds for a full table router deployment and gather top talkers from sFlow, which says what routes are to be installed in TCAM, instead of hitting a default route.

You realise that this is not just what the problematic "route optimizers" do, right? And also not the problematic bit?

  -- Niels.

not properly prepending communities on synthetic routes

Let’s not normalize ‘synthetic route’ as a term. It’s not a thing that exists.

Tom,

The automotive industry has normalized “synthetic”. It’s motor oil that is artificially created, vs pulled out of the ground and refined. It’s a perfect analogy for routes that were created by third-party software, vs organically created/redistributed from the proper AS.

That said, I can argue that upstreams not filtering their customers properly removes a safety guard, upstreams not implementing RPKI removes a safety guard, not properly prepending communities on synthetic routes to drop them on export again removes a safety guard. I can go on...

There's a fundamental difference.

Not filtering customers properly fails to implement a safety guard that should have been implemented. Not implementing RPKI fails to implement an additional safety guard. Not properly prepending communities fails to implement an additional safety guard.

Rewriting the AS path removes a core descriptive component of NLRIs inherent in the BGP protocol which is critical to implementing other safety guards.

Including - as an example of only of the harmful effects of this practice - the ability for the upstream to automatically drop all routes which you just reflected back to it, having just rewritten the AS path to remove their ASN and rewrite the NHIP, because bgp loop-free routing requires this by default in the protocol.

When you drop core safety components, accidents are more likely to happen.

Where this statement falls short is, those are all regulated by building codes, laws, etc. No laws exist dictating how BGP, routing protocols in general, and topologies must be implemented, nor what safety guidelines must be adhered to.

The normal progression of many technologies ends in regulation. We already have regulation which covers bgp inter-domain routing security in the EU, and I'd be surprised if it wasn't going to happen in other jurisdictions in due course.

In the US, warning shots have already been fired by the white house:

https://www.whitehouse.gov/wp-content/uploads/2024/09/Roadmap-to-Enhancing-Internet-Routing-Security.pdf

This style of document should be taken as notification that interdomain routing security is fresh on the table of regulatory bodies in the US.

Nick

Hi Nick,

There is consequence to resisting standards work in an area of need
just because you're broadly opposed to the technology being used that
way in any capacity. One example is IPv6. Another is CGNAT.

If you'd rather not follow those examples, stop talking about why
route optimizers mustn't be done and move the conversation to what it
would take to _safely_ do it.

Regards,
Bill Herrin

Ryan-

Unfortunately it doesn’t appear that you have a solid understanding of core BGP fundamentals.

I suggest starting with a read of RFC4271.

Have a great weekend and holiday season.

Nick,

I appreciate the explanation and example, and agree with that as a very strong recommendation.

Reading Noction’s IRP Lite documentation (https://www.noction.com/wp-content/uploads/2016/09/irp-lite-documentation.pdf) - page 214, with bgpd.as_pathset to “5 4 2 3” by default (table below), it makes a genuine effort to use the same AS-path when possible.

0 - Allow empty AS-PATH
2 - Use non-empty reconstructed AS-PATH (Announce AS-path reconstructed from traceroute)
3 - Reconstruct AS path with provider ASN and prefix origin ASN
4 - Use AS-Path from BMP
5 - Use AS-Path from BGP Alternative paths (RFC 7911)

That means (at least for Noction) the operator has to go out of their way to disable safety, so those that claim it has bad defaults, may want to RTFM. Now, I have never had a need to change that value, nor have I advised others to do the same. I agree having an empty AS path is asking for trouble when it gets leaked.

While I appreciate various business drivers and motivations exist to
deploy software solutions to modify & optimize routing on the fly, I
think I disagree with you on this one point.

Operators *literally* have to go out of their way to configure Noction
to be safe to use. It is not safe to use out of the box. Page 29:

    """
    improvements should be stopped from propagating across routing
    domains. A route map is used to address this.
    [snip]
    Refer your router capabilities in order to produce the correct route
    map. The route map MUST be integrated into existing route maps. It
    is not sufficient to simply append them.
    """
    (red: Noction calls the synthetic unauthorized more-specific hijack
          route announcements "improvements")

From Noction's other documentation at Do route optimizers cause fake routes? | Noction

    """
    In order to further reduce the likelihood of these problems
    occurring in the future, we will be adding a feature within
    Noction IRP to give an option to tag all the more specific
    prefixes that it generates with the BGP NO_EXPORT community.
    -->>> This will not be enabled by default <<<---
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    """

Noction made their software UNSAFE BY DEFAULT. In my opinion this is a
very poor product design choice, and the very reason we keep coming back
to this specific topic.

Other routing optimizers product never make the news, guess what they
all have in common? They set NO_EXPORT by default! :slight_smile:

Efforts to define new extensions to the BGP protocol to make this type
of product safer in use (creating a new AFI/SAFI or something else) via
IETF is interesting, but it appears Noction is not even doing the bare
minimum within the existing standards.

Kind regards,

Job

Sure, alright but given what you just said doesn’t it seem odd that there is still a static BGP tiebreaker in 2024?

Yes, it really doesn’t have anything to do with one ASN’s particular outbound routing policy.

It only matters if that routing policy gets outside of that one ASN.