Route optimization using GPUs?

So back in the… hell I don’t know like… early 2010s there was a push for ‘route optimization’ from products like RouteScience and the Avaya CNA and more recently whatever Noction is doing.

The big pain point for this technology at the time was that it could only optimize the top N egress routes due to how many probes it could send out and how many results it could process.

It seems like now with a modest GPU in a router you could pretty easily ‘optimize’ [to the extent that you believe this technology worked] pretty much the whole routing table.

We used these tools extensively back then and they actually worked pretty well in most cases. The biggest issue we ran into was people complaining that we pinged their IP addresses… which now a days seems like a great worst problem to have.

Anyway is anyone doing any work on implementing GPUs into the BGP decision making process? Seems like a no brainer.

-Drew

WIth merchant silicon getting faster and stronger everyday, and capacity and transit in a freewill, I’m not sure what GPU optimization would buy you, not to mention the ROI. The Internet routing table is not showing substantial signs of growth and in some cases has experienced a plateau. Also, the experience with ‘route optimization tools’ is that while they may bring you some priority in your traffic, they are also known for making horrible decisions resulting in widespread outages.

J~

It’s not even that.

GPU’s are very good at parallelized vector computations. They are very very good at THAT, but ONLY that. This is no different conceptually than router ASICs. They are designed to do ONE thing very well,

BGP bestpath selection is a completely different computational process.

+nanog

Greetings, Drew

We don’t use GPUs, but we have worked on a similar project focused on traffic optimization. I would say that the ping issue is one of the top pain points, even more significant than route recalculation. The more routes you try to monitor and optimize, the more significant the problem will become.

Regarding optimization itself, I don’t think you would want to optimize the entire routing table at the same time. From a calculation perspective, I don’t believe it’s a major problem at this point. Please correct me if I’m wrong.

Regards,

Andrey

чт, 5 дек. 2024 г. в 18:32, Drew Weaver <drew.weaver@thenap.com>:

This is part of why the old systems that were mentioned only optimized the top n routes based on traffic flow (or other configurable metric), because the cost vs. benefit falls off greatly.

Shane

I’d imagine the GPUs would be used to help the control plane only, to bulk-process several prefix updates?

Actually, some kind of content addressable memory looks more promising
for increasing update throughput than GPUs.

Rubens

I don’t know that you need to spread BGP best path analysis onto a GPU, but conducting the testing that those boxes do to the entire Internet instead of just top X destinations would be quite parallel.

IIRC, the widespread outages are the result of exporting things that shouldn't be exported.

“I strongly recommend to turn off those BGP optimizers, glue the ports shut, burn the hardware, and salt the grounds on which the BGP optimizer sales people walked.”

-Job Snijders

https://mailman.nanog.org/pipermail/nanog/2017-August/092131.html

“I strongly recommend to turn off those BGP optimizers, glue the ports shut, burn the hardware, and salt the grounds on which the BGP optimizer sales people walked.”

-Job Snijders

https://mailman.nanog.org/pipermail/nanog/2017-August/092131.html

Hi Drew,

It was and remains a data problem, not a compute problem. You have to
have places to probe who don't object to receiving a usefully high
quantity of probes, and the capacity to issue and receive those
probes. Processing the results is, by comparison, not a whole lot more
consumptive than the normal best-path selection algorithm.

Regards,
Bill Herrin

* nanog@ics-il.net (Mike Hammett) [Thu 05 Dec 2024, 21:18 CET]:

Eh, different people have different opinions.

I think most of the hatred towards them is unwarranted,

You speak like a person whose prefixes were never hijacked and blackholed by one of those things.

  -- Niels.

“I strongly recommend to turn off those BGP optimizers, glue the ports shut, burn the hardware, and salt the grounds on which the BGP optimizer sales people walked.”

-Job Snijders

https://mailman.nanog.org/pipermail/nanog/2017-August/092131.html

Not by the box, but by the operator of the box.

* nanog@ics-il.net (Mike Hammett) [Thu 05 Dec 2024, 21:32 CET]:

Not by the box, but by the operator of the box.

We've been over this numerous times around the time when that Honest Networker page in my email was created. Noction ship their boxes with insanely dangerous defaults that are bound to lead to catastrophic route leaks. "Unsafe at any speed" applies to the whole category but especially to that implementation.

I don't know if they've improved or lost business since. Their major accomplishment as a company may turn out to be, unintended though it may be, hastening the global acceptance and rollout of RPKI OV.

  -- Niels.

I think most of the hatred towards them is unwarranted,

This is essentially saying “I’ve never had a problem , so I don’t think it’s a big deal.”

shrugs Incorrectly assigning the blame doesn’t really help anyone.

Sure, but the fact remains that there is blame to be assigned.

It doesn’t really matter to the affected network if the fault lies with the box itself, or the operator of the box, or the person who makes the morning coffee for the person who operates the box — the fact still remains that this call of devices have caused significant disruption for a whole bunch of external networks.

“Guns don’t kill people, people with guns kill people” may be a factually true statement - but if there were no guns, there would be less people being shot…

“Route optimizers don’t hijack routes, operators with route optimizers hijack routes” falls into the same category…

W

Warren,

  • “Guns don’t kill people, people with guns kill people” may be a factually true statement - but if there were no guns, there would be less people being shot…

While that is also a factually true statement, you are also painting broad strokes over those who are responsible with those weapons. Employment requirements, hunting season, target practice at a range, skeet shooting, are just a few reasons to have them. Let’s not dismiss those who follow the law, get qualified on a regular basis or have adequate training when/where/why/how to properly use them.

“One rotten apple spoils the whole bunch”, does not work here.

This same thing also applies to operators of route optimizers. They are responsible for writing the correct import/export policies for their network, just like the carriers for writing sane policies for customer circuits. For the incidents those route optimizers have caused, the vendors, their customers, and upstream ISPs are still in business.

Ryan Hamel