Re: Destination Preference Attribute for BGP

“prepend as-path” has taken its place.

That pours water on my imaginary fire :-). I was imagining something sexier, especially given how pretty “useless” AS_PATH prepending is nowadays. Mark.

That’s true Robert.

However, communities and med only work with neighbors.

Communities routinely get scrubbed because they cause increased memory usage and convergence time in routers.

Even new path attributes get scrubbed, because there have been bugs related to new ones in the past.

Here is a config snippet in XR

router bgp 23456

attribute-filter group testAF

attribute unrecognized discard

!

neighbor-group testNG

update in filtering

attribute-filter group testAF

The only thing that has any chance to go multiple ASes is as-path.

Need to be careful with that too because long ones get dropped.

route-policy testRP

if as-path length ge 200 then

drop

endif

end-policy

Perhaps to you Robert.

I work on code and with customer issues that escalate to code.

We support platforms of various capacities.

While we would all like to sell the large ones, people buy the cheap ones too.

Fact remains, operators scrub communities and path-attributes for many reasons.

That’s why as-path length is used as a traffic engineering mechanism over multiple AS hops.

As limited as it is, it’s what we have.

Hi Jakob,

That’s true Robert.

However, communities and med only work with neighbors.

Communities routinely get scrubbed because they cause increased memory usage and convergence time in routers.

Considering that we are talking about control plane memory I think the cost/space associated with storing communities is less then negligible these days.

And honestly with the number of BGP update generation optimizations I would not say that they contribute to longer protocol convergences in any measurable way.

To me this is more of the no trust and policy reasons why communities get dropped on the EBGP peerings.

Cheers,
R.

Hi Robert,

Without naming any names, I will note that at some point in the not-too-distant past, I was part of a new-years-eve-holiday-escalation to $BACKBONE_ROUTER_PROVIDER when the global network I was involved with started seeing excessive convergence times (greater than one hour from BGP update message received to FIB being updated).
After tracking down development engineer from $RTR_PROVIDER on the new years eve holiday, it was determined that the problem lay in assumptions made about how communities were stored in memory. Think hashed buckets, with linked lists within each bucket. If the communities all happened to hash to the same bucket, the linked list in that bucket became extremely long; and if every prefix coming in, say from multiple sessions with a major transit provider, happened to be adding one more community to the very long linked list in that one hash bucket, well, it ended up slowing down the processing to the point where updates to the FIB were still trickling in an hour after the BGP neighbor had finished sending updates across.

A new hash function was developed on New Year’s day, and a new version of code was built for us to deploy under relatively painful circumstances.

It’s easy to say “Considering that we are talking about control plane memory I think the cost/space associated with storing communities is less then negligible these days.”
The reality is very different, because it’s not just about efficiently storing communities, it’s really about efficiently parsing and updating communities–and the choices made there absolutely DO “contribute to longer protocol convergences in any measurable way.”

Matt
(the names have been obscured to increase my chances of being hireable in the industry again at some future date. :wink:

Jakob,

With AS-PATH prepend you have no control on the choice of which ASN should do what action on your advertisements.

My comprehension of DPA would have been more directed than the "spray & pray" approach AS_PATH prepending provides.

However, the practice of publishing communities by (some) ASNs along with their remote actions could be treated as an alternative to the DPA attribute. It could result in remote PREPEND action too.

If only those communities would not be deleted by some transit networks ....

Even then, communities have best efficacy at the closest interconnect with the network that is using them. Since community values are generally not coordinated Internet-wide, it would take a laborious effort to construct the right set of communities to achieve the desired outcome for an end-to-end source or destination path, assuming those communities were not mangled in flight.

Mark.

Jakob,

With AS-PATH prepend you have no control on the choice of which ASN should do what action on your advertisements.

My comprehension of DPA would have been more directed than the "spray & pray" approach AS_PATH prepending provides.

However, the practice of publishing communities by (some) ASNs along with their remote actions could be treated as an alternative to the DPA attribute. It could result in remote PREPEND action too.

If only those communities would not be deleted by some transit networks ....

Even then, communities have best efficacy at the closest interconnect with the network that is using them. Since community values are generally not coordinated Internet-wide, it would take a laborious effort to construct the right set of communities to achieve the desired outcome for an end-to-end source or destination path, assuming those communities were not mangled in flight.

Mark.

Really? We only scrub a specific string of communities that would trigger undesired outcomes in our network if received from customers. Otherwise, we pass on what we receive and just add our own bits… matches will still occur? In 2023 control planes, I can’t think of an obvious reason to justify that communities are scrubbed to save memory. Mark.

Even a bare bones x86 platform of some sort with at least 8GB of RAM would make the cheapest routers still, well, cheap. Mark.

To be fair, you are talking about an arbitrary value of years back, on boxes you don’t name running code you won’t mention. This really not saying much :-). Corner cases, while valid, do not speak to the majority. If this was a major issue, there would have been more noise about it by now. There has been quite some noise about lengthy AS_PATH updates that bring some routers down, which has usually been fixed with improved BGP code. But even those are not too common, if one considers a 365-day period. Mark.

[…]
To be fair, you are talking about an arbitrary value of years back, on boxes you don’t name running code you won’t mention.

This really not saying much :-).

Hi Mark,

I know it’s annoying that I won’t mention specifics.
Unfortunately, the last time I mentioned $vendor-specific information on NANOG, it was picked up by the press, and turned into a multimillion dollar kerfuffle with me at the center of the cross-hairs:
https://www.google.com/search?q=petach+kablooie&sca_esv=558180114&nirf=petah+kablooie&filter=0&biw=1580&bih=1008&dpr=2

After that, I’ve learned it’s best to not name specific very-big-name vendors on NANOG posts.

What I can say is that this was one of the primary vendors in the Internet backbone space, running mainstream code.
The only reason it didn’t affect more networks was a function of the particular cluster of signalling communities being applied to all inbound prefixes, and how they interacted with the vendor’s hash algorithm.

Corner cases, while valid, do not speak to the majority. If this was a major issue, there would have been more noise about it by now.

I prefer to look at it the other way; the reason you didn’t hear more noise about it, is that we stubbed our toes on it early, and had relatively fast, direct access to the development engineers to get it fixed within two days. It’s precisely bcause people trip over corner cases and get them fixed that they don’t end up causing more widespread pain across the rest of the Internet.

There has been quite some noise about lengthy AS_PATH updates that bring some routers down, which has usually been fixed with improved BGP code. But even those are not too common, if one considers a 365-day period.

Oh, absolutely. Bugs in implementations that either crash the router or reset the BGP session are much more immediately visible than “that’s odd, it’s taking my routers longer to converge than it should”.

How many networks actually track their convergence time in a time series database, and look at unusual trends, and then diagnose why the convergence time is increasing, versus how many networks just note an increasing number of “hey, your network seems to be slowing down” and throw more hardware at the problem, while grumbling about why their big expensive routers seem to be less powerful than a *nix box running gated?

I suspect there’s more of these type of “corner cases” out there than you recognize.
It’s just that most networks don’t dig into routing performance issues unless it actually breaks the router, or kills BGP adjacencies.

If you are one of the few networks that tracks your router’s convergence time over time, and identifies and resolves unexpected increases in convergence time, then yes, you absolutely have standing to tell me to pipe down and go back into my corner again. ;D

Mark.

Thanks!

Matt

So, while this all sounds good, without any specifics on vendor, box, code, code revision number, fix, year it happened, current status, e.t.c., I can’t offer any meaningful engagement. We all run into odd stuff as we operate this Internet, but the point of a list like this is to share those details so we can learn, fix and move forward. Your ambiguity does not lend itself to a helpful discussion, notwithstanding my understanding of your caution. I am less concerned about keeping smiles on vendors’ faces. I tell them in public and private if they are great or not. But since you’ve been burned, I get. It’s just not moving the needle on this thread, though. Mark.

This reminds me of two things.

First, some code I wrote more than 20 years ago to track and bill for overlapping dial-up sessions (i.e. dial-up account sharing). Processing the RADIUS accounting data, I built a binary tree of users with each node having a linked list of session data. I found while testing it, that as the amount of data fed in grew, the program got slower. I solved it by converting the session data linked lists to doubly linked lists, allowing me to add session data to the lists by jumping directly to the end, seeing if that's where the current session belonged, and walking back the list if necessary, but generally it was not since the input data was generally in chronological order. That made it super fast again.

Second, we ran into an issue with Arista some time ago and a peer on AMS-IX that set a ridiculous number of communities on their routes. Arista uses (used?) a fixed length buffer for communities in route-map processing and when doing "match community" in a route-map, if the set of communities on the route is longer than the fixed length buffer, and the communitites you're trying to match fall off the end, your route map match statement will fail to match, even though a show ip bgp... will show you that the communities you're trying to match are there.

So, while this all sounds good, without any specifics on vendor, box, code, code revision number, fix, year it happened, current status, e.t.c., I can’t offer any meaningful engagement.

If you clicked Matt’s link to the Google search, you could tell from the results what vendor , model, and year it was pretty quickly.

It’s just not moving the needle on this thread, though.

Assertion Made : “Networks can scrub communities for memory or convergence reasons.”

Others : "That doesn’t seem like a concern. "
Matt : “Here was a real situation that happened where it was a concern, and the specifics on the reason why.”

How is that not 'moving the needle? Because you didn’t get full transcripts of his conversation with the vendor?. I’m sure a lot of people didn’t even know that hashing / memory hotspotting was even a thing. Now they do.

I did. Those are headlines. The solider that was on the battlefield won’t speak to the exact details. I won’t press, especially because nobody that needed a T1600 back then probably still runs one today. There are a lot of things that vendors have fixed in BGP that we shall never know. What I am saying is that for those that have been fixed, unless someone can offer up any additional evidence in 2023, the size of the number of BGP communities attached to a path does not scream “danger” in 2023 hardware. And the T1600 is a looooong time ago. Mark.

What I am saying is that for those that have been fixed, unless someone can offer up any additional evidence in 2023, the size of the number of BGP communities attached to a path does not scream “danger” in 2023 hardware. And the T1600 is a looooong time ago.

Again, as it was stated, the size of or number of BGP communities wasn’t the problem anyway; it was hashing / memory storage. And you know what? Hashing / memory storage HAS been a problem with multiple vendors in many other contexts, not just BGP community stuff. Has nothing to do with “2023 hardware”. You can bog down top of the line DDR5 memory pretty easily if you make certain coding choices.

You can choose ( as you apparently have ) to just presume that a problem that happened before won’t ever happen again. Prob not a great idea though.

Again, as it was stated, the size of or number of BGP communities wasn't the problem anyway; it was hashing / memory storage. And you know what? Hashing / memory storage HAS been a problem with multiple vendors in many other contexts, not just BGP community stuff. Has nothing to do with "2023 hardware". You can bog down top of the line DDR5 memory pretty easily if you make certain coding choices.

I suppose that can be said of anything in our industry. There is always a way we can hurt ourselves, especially if we ignore or have not learned our lessons.

With the information that I have, I don't consider it a massive issue at the moment. Of course, that is always a changing landscape, and my views may alter when that happens sufficiently. But as of now, I am dealing with far more severe BGP bugs than what appears to have been, at least for the moment, fixed.

You can choose ( as you apparently have ) to just presume that a problem that happened before won't ever happen again. Prob not a great idea though.

Ummh, okay...

Mark.

With AS-PATH prepend you have no control on the choice of which ASN should do what action on your advertisements.

Robert- It is somewhat this problem we are trying to resolve.

I was imagining something sexier, especially given how pretty “useless” AS_PATH prepending is nowadays.

I, too, am looking for something sexy (explained below). But can you explain why you think AS_PATH is “useless,” Mark?

For background, and the reason I asked about DPA:
Currently, our routing carries user traffic to a single data center where it egresses to the Internet via three ISP circuits, two carriers. We are peering on a single switch stack, so we let L2 “load balance” user flows for us. We have now brought up another ISP circuit in a second data center, and are attempting to influence traffic to return the same path as it egressed our network. Simply, we now have two datacenters which user traffic can egress, and if one is used we want that traffic to return to the same data center. It is a problem of asymmetry. It appears the only tools we have are AS_Path and MED, and so I have been searching for another solution, that is when I came across DPA. In further looking at the problem, BGP Communities also seems to be a possible solution, but as the thread has explored, communities may/may not be scrubbed upstream. So, presently we are looking for a solution which can be used with our direct peers. Obviously, if someone has a better solution, I am all ears.

A bit more info: we are also looking at an internal solution which passes IGP metric into MED to influence pathing.

To avoid TL;DR I will stop there in the hopes this is an intriguing enough problem to generate discussion.