BGP Optimizers (Was: Validating possible BGP MITM attack)

Hi Job,

I believe your disclaimer makes a lot of sense. From our perspective using more specifics is one of the options to make BGP follow the optimized path instead of the « natural » path. We used to be doing more specifics because with the same prefix being announced, we were simply not getting a best route announced back to the optimiser. Since the adoption of BGP ADD-PATH, our solution does not need to use more specifics to maintain a full collection of the routers BGP table. (In addition, it has actually never been a strong a requirement due to the use of other SNMP collection processes.)
Therefore LOCAL_PREF is the option we advise and implement.

The examples you mention confirm the issues are mainly due to poorly configured networks where routes are leaked out although they shouldn’t be. Adequate routers are able to filter out prefixes based on attributes like communities, which we set by default.
We’ve had an instance of such an issue with one of our customers a few years ago; it turned out to be mistaken CLI commands that the engineer gave to the router.
Our XCA software service and platform has hundreds of ASs running for years and none are making any noise.

Another point of discussion is the fact that transit and large content providers actually accept thousands of routes coming from anywhere, there is a lot of room for optimization. And I know how much you personally try to contribute to enhance this.

There actually is a reason for operating BGP optimizers. The BGP protocol, while robust and scalable, doesn't know anything about link capacity, doesn’t apply performance analytics and can easily drive links into saturation, introducing packet loss. Also, it is not aware of commercial agreements like CDR, generating costs that could be prevented. It also, of course, ignores the performance of available paths.
All of the above actually impacts customer traffic and business performance.
Since a few years we see our Customers take more care of quality and capacity management… and stop relying on BGP « blindly ».

Most transit providers like to explain that their service are premium and that’s the reason why their prices are premium. But when you look at actual performance measurements, some premium providers are actually just behind the cheaper ones.

I’m in RIPE 76 tomorrow, I’ll be more than happy to discuss this topic further with you.

Kind regards,
François

(I’m a product engineer at Border 6 - Expereo, a BGP optimization software company.)

François DEVIENNE
Mobile: +33.651.937.927
E-mail: francois.devienne@expereo.com<mailto:francois.devienne@expereo.com>
BORDER 6 S.A.S. - EXPEREO

Dear Francois,

The examples you mention confirm the issues are mainly due to poorly
configured networks where routes are leaked out although they
shouldn’t be. Adequate routers are able to filter out prefixes based
on attributes like communities, which we set by default.

Question: is your implementation setting NO_EXPORT by default, or some
other communities? Not all BGP "Optimizers" set NO_EXPORT by default...
in that context I am not sure it is fair to say "the network is poorly
configured" - when an "optimizer" doesn't set NO_EXPORT it is the
"optimizer" that is comes with poor defaults.

And there is another challenge: NO_EXPORT does not always work
correctly. Software defects happen, and are unpredictable. All major
routing platforms have seen bugs where for one reason or another
NO_EXPORT does not (or no longer) work correctly. Heavy reliance on
NO_EXPORT can be a weakness in network architecture.

There actually is a reason for operating BGP optimizers. The BGP
protocol, while robust and scalable, doesn't know anything about link
capacity, doesn’t apply performance analytics and can easily drive
links into saturation, introducing packet loss. Also, it is not aware
of commercial agreements like CDR, generating costs that could be
prevented. It also, of course, ignores the performance of available
paths. All of the above actually impacts customer traffic and
business performance. Since a few years we see our Customers take
more care of quality and capacity management… and stop relying on BGP
« blindly ».

You are correct that there is a need (and a market) for BGP optimizers.
BGP is terrible for routing around problematic parts of the topology if
we look at metrics such as latency or packetloss.

In my opinion, what the "optimizer" vendors _should_ have done (years
ago), is go to IETF to the IDR working group and help standardize a safe
and robust way to extend the BGP protocol to allow for better traffic
engineering. Think along the lines what flowspec is for firewalls,
perhaps there should be a "traffic engineering spec (tespec)" for
routers. This should happen in a different Subsequent AFI to avoid the
extremely risky behaviour we now see in the Unicast SAFI.

I have no problem with software that automatically interacts with
routers to manipulate LOCALPREF, there is no issue if software supresses
more-specifics, prepends of the own ASN is no issue either, but what
_is_ an issue is any software that generates fake routes and distributes
those fake routes in the IPv4/IPv6 Unicast AFI/SAFI on boxes connected
to the BGP DFZ.

To phrase and summarize the isuse, using the terminology you provided:

The "optimized" paths MUST NEVER be distributed in the same AFI/SAFI as
the "natural" paths. Overloading the "natural" AFI/SAFI with fake
generated more-specifics, which only exist for the purpose of outbound
traffic engineering (even with communities) is a very dangerous thing.

Yes, this will be a bit of work, but that is the cost of doing business.

(I’m a product engineer at Border 6 - Expereo, a BGP optimization
software company.)

I appreciate your outreach in this public forum and the disclosure.

Kind regards,

Job