Networks ignoring prepends?

William_Herrin · January 22, 2024, 12:49pm

Howdy,

Does anyone have suggestions for dealing with networks who ignore my
BGP route prepends?

I have a primary ingress with no prepends and then several distant
backups with multiple prepends of my own AS number. My intention, of
course, is that folks take the short path to me whenever it's
reachable.

A few years ago, Comcast decided it would prefer the 5000 mile,
five-prepend loop to the short 10 mile path. I was able to cure that
with a community telling my ISP along that path to not advertise my
route to Comcast. Today it's Centurylink. Same story; they'd rather
send the packets 5000 miles to the other coast and back than 10 miles
across town. I know they have the correct route because when I
withdraw the distant ones entirely, they see and use it. But this time
it's not just one path; they prefer any other path except the one I
want them to use. And Centurylink is not a peer of those ISPs, so
there doesn't appear to be any community I can use to tell them not to
use the route.

I hate to litter the table with a batch of more-specifics that only
originate from the short, preferred link but I'm at a loss as to what
else to do.

Advice would be most welcome.

Regards,
Bill Herrin

Mel_Beckman · January 22, 2024, 1:13pm

Prepend contraction is becoming more common. You can’t really stop providers from doing it, and it reduces BGP table size, which I’ve heard as a secondary rationale. I’d love to see the statistics on that though.

BGP Communities seem to be the only alternative, and that limits your engineering reach to mostly immediate peers.

Another problem is providers that hide multiple router hops inside MPLS, which appears as a single ip hop in traceroutes, making it impossible to know the truth path geographically.

The Internet is lying to itself, and that’s not a situation that can persist forever.

-mel via cell

Jon_Lewis1 · January 22, 2024, 1:23pm

In my experience, it's pretty common for service providers to use localpref to differentiate paid/free/customer routes (with LP increasing in this order). Since LP trumps as-path length, no amount of prepending will get around this.

You may be limited to seeing if your backup providers have community controls that would let you tell them "don't share with Centurylink" or seeing if your primary has similar controls that would let you advertise both the aggregate and more specifics, but have them not propagate the more specifics except to those networks (i.e. Centurylink) that you need to see them to get them off your backup paths.

ianai · January 22, 2024, 1:24pm

The Internet is lying to itself, and that’s not a situation that can persist forever.

I am not sure I agree.

First, prepends are a suggestion. Perhaps a request. It has never (or at least not for the 3 decades I’ve been doing this) been a guarantee. In the situation below, perhaps the 5K mile backup path is through a provider who pays Centurylink (Lumen?). Standard practice is to localpref your customers up, which makes prepends irrelevant. Why would anyone expect different behavior?

As for hiding hops, that is not lying. What happens inside my network is my business. If I give the world some info, say with in-addrs on hops, that’s fine. If I do not, I am not “lying”. This is perfectly sustainable, nothing will break (IMHO). In fact, I would argue without tools like MPLS, the Internet would have broken a long time ago.

William_Herrin · January 22, 2024, 2:02pm

It gives me, your paying customer, less control over my routing
through your network than if I wasn't your paying customer. That
seems... backwards.

Regards,
Bill Herrin

Niels_Bakker · January 22, 2024, 2:21pm

* bill@herrin.us (William Herrin) [Mon 22 Jan 2024, 15:05 CET]:

Jon_Lewis1 · January 22, 2024, 2:49pm

Not at all. Think like a service provider.

"I've got packets to deliver. I've got 3 different classes of paths I can use. One of them, I get paid to use. One is cost neutral. The last one, I pay to use."

Which path would you pick (assuming you're trying to maximize revenue from your network)?

James_Jun · January 22, 2024, 6:18pm

Nope, that is not at all backwards.

Have you actually wondered what would happen, if every major ISP stopped classifying routes with localpref, and treated every route received by them (including customers and external peers) on same local-pref, so your AS prepending can work easily?

Some 21 years ago, there was this little known story during early stages of the IPv6 development, called 6bone. Aside from the lack of native IPv6 (where everything had to be tunneled), the #1 issue that guaranteed IPv6 sucked many times worse than IPv4 back in the day was the lack of BGP clue by most of IPv6 DFZ participants at that time, where nobody classified any of their routes accordingly with localpref and communities.

Not classifying your routes with local-pref leads to complete operational chaos, including world-tour hair-pin sightseeing becoming very common with IPv6 during 6bone days (which resulted in rise of as30071/occaid to dominate the IPv6 DFZ for several years for many to transition out of 6bone). Not classifying routes with local-pref means you do not care whether a particular peer is a settlement-free peer or a customer-- this lack of relationship classifiction leads to operational harm: A customer may be paying you $/bits expecting you to deliver your on-net traffic onto them over their paid peering (or transit) link they bought from you, except, only to find you preferring an IX peer (e.g. Hurricane Electric, etc. over IX) as best-path, even without any AS Path prepending involved.

Further, not classifying routes with local-pref and ident communities means you are entirely at the mercy of prefix-lists applied on your export policy. A very common occurrence is often a rookie ISP appeared to be giving "transit" to a major Tier-1 backbone on a route that was supposed to be customer-originated route, but this network selected AS-Path via its uptream provider as best-path, instead of direct connection into the said customer. This happens a lot on a route that is "downstream of a downstream" customer, who is also multi-homed with the said rookie ISP's upstream Tier-1 provider, thereby resulting in equidisant AS-Paths to what is supposed to be a customer-originated route. Scale this up to many routes and you have complete chaos and breakdown of your BGP routing table.

So, as a customer, you actually SHOULD be demanding your ISPs to positively identify and categorize their routes using local-pref and communities. In fact, I will never purchase IP transit with BGP from a provider who doesn't categorize routes with local-pref. As a customer, if you want more control over your network's incoming traffic, you need to instead ask your upstream providers about their BGP routing policy and how well they support BGP communities to let you steer traffic, and use those communities to make absolute traffic decisions.

Always remember this #1 rule of BGP decision process: AS Path is a **tie-breaker** to local-pref classification. When you prepend AS Path, your goal is to try to steer traffic from routes that are in the same category (i.e. customer or peer) as you. When your goal is absolute steering (i.e. absolute as in, do not advertise to a particular peer, or make your connection standby backup where no traffic ever comes until there is complete outage on the other path, etc), you absolutely SHOULD be using BGP communities provided by your upstream IP provider. If your IP transit provider does not provide extensive BGP communities to meet your requirements, cancel their service and give your business to someone else.

A rookie BGP mistake that is commonly made made by those without real-world experience, is the assumption that AS Path prepending should deliver absolute traffic steering -- it does not, and should NOT, by design. The BGP Best-Path Selection Algorithm is taught very well in the CCIE curriculum, but last I looked, they don't teach you on the _why_, only on on the how. So it's common to see enterprise CCIE's working for VARs often falling into the false assumption of AS Path. See Select BGP Best Path Algorithm - Cisco

Hope this clarifies.

James

Steve_Gibbard · January 22, 2024, 6:17pm

To expand on what others have said here, I find it helpful to think of BGP as a policy enforcement protocol, rather than as a distance vector routing protocol.

To that end, there’s a generally expected hierarchy of routes, and then a lot of individuality between networks. Having done traffic engineering for some global CDNs, there’s a bunch of inbound traffic control that you can do by letting an understanding of how most other providers think about this guide your transit and peering policies, and a remaining portion that generally needs to be solved through either discussions, negotiations, or commercial arrangements with the sending party or their upstreams.

For the general rules, local-preference trumps everything else. The number of AS path hops comes after local-preference. Other things being equal networks usually like to hand off traffic to a short AS path, and at the closest point to its origination (there are valid performance reasons for this) but local-preference policies will override both of those.

Local-preferences usually have three default tiers — customer, peering, and transit. In other words, get paid, hand off for free, and pay. There are often some additional peers that can be selected for traffic engineering reasons, either internally or by customers using BGP communities. BUT, those BGP communities don’t transit to other ASes, so even if you manage to signal one hop up stream, you may still find your upstream provider announcing your routes to those who have different ideas.

One example of this from the early days of anycasted DNS root servers involved k.root-servers.net installing a node in Delhi, which pulled 60% of its traffic from North America. This was clearly non-optimal. They had attempted to get routing diversity by getting transit from different providers in different parts of the world, but their Delhi node was, if I recall correctly, a customer of a customer of a customer of Level3. Oops.

So, what do you do about this?

If you’re a global network operator, you probably attempt to maintain consistent peering/transit relationships across sites. That way, AS paths and local-preferences should be fairly even, and you can let nearest exit routing do its thing.

If you have a smaller network, but have multiple interconnection locations that are far enough apart to make a performance difference, make the same transit and peering relationships at each one. Make exceptions only for peers (not transit providers) whose customers or services only exist in one of the areas, and make sure they don’t announce your routes to their upstreams. That way you won’t trombone traffic.

If you’ve done all that, and traffic is still coming in the wrong place, then you start talking to people. “Hey, I’m buying transit from you in both Asia and the Western US, and all my traffic from asian-country-x is coming into San Jose. Why?” “Well, they only have a 100 Mb/s interconnection to us in Asia. We have to traffic engineer around it.” And then you have to figure out how to convince some national telco to want to talk to you more than they want to talk to your transit provider.

I think in your case, I would be asking why you have a 5,000 mile, five-prepend loop to get to a provide ten miles away. It suggests that your network is doing things 5,000 miles away that are inconsistent with what you’re doing locally, or that you have upstreams who aren’t interconnecting locally or aren’t maintaining sufficient capacity or sufficient political relationships on those paths. All of those would predictably have this result. The solution is likely to take a look at your transit relationships, ask your transit providers about their transit relationships, and either supplement or switch to a set of transit providers who can provide the routing you want.

-Steve

William_Herrin · January 22, 2024, 8:16pm

As I already explained, neither the primary nor any of the backup
providers directly peer with Centurylink, thus have no communities for
controlling announcements to Centurylink.

I hate to litter the table with a batch of more-specifics that only
originate from the short, preferred link but I'm not hearing any
practical alternatives. Treating my distant links as equivalent even
though I told you with prepends that they are not leaves me with few
knobs I can turn.

Regards,
Bill Herrin

Forrest_Christian_Li · January 22, 2024, 8:34pm

I really really wish there were a couple of well-known and globally respected communities which you could set to say either “this is a route of last resort” or “this is my preferred route”.

I feel like it would avoid many of us doing exactly what you’re about to do which is pollute the routing tables with extra, more specific routes to do basic traffic engineering. (Resulting in 3 routes where one would do).

I’m not talking fine level control here, just being able to say “hey this route is better than nothing, but not much” or “treat this as backup”.

I understand the resistance to honoring various route engineering tactics, but being able to effectively do the exact same thing that announcing more specifics does without having to resort to announcing more specifics would be a good thing as far as the global bgp table size goes.

William_Herrin · January 22, 2024, 8:35pm

Hi James,

The best path to me from Centurylink is: 3356 1299 20473 11875

The path Centurylink chose is: 3356 47787 47787 47787 47787 53356
11875 11875 11875

Do you want to tell me again how that's a reasonable path selection,
or how I'm supposed to pass communities to either 20473 or 53356 which
tell 3356 to behave itself?

Regards,
Bill Herrin

William_Herrin · January 22, 2024, 9:26pm

The best path to me from Centurylink is: 3356 1299 20473 11875

The path Centurylink chose is: 3356 47787 47787 47787 47787 53356
11875 11875 11875

Do you want to tell me again how that's a reasonable path selection,
or how I'm supposed to pass communities to either 20473 or 53356 which
tell 3356 to behave itself?

AS53356 (Free Range Cloud Hosting) appears to have some limited BGP communities that may help.
Free Range Cloud BGP Communities | Free Range Cloud Docs

implies that you sending 53356:19014 would block announcements to 47787.

At which point Centurylink chooses 40676 7489 11875 11875 11875 11875
11875 11875 11875.

This certainly seems like a reasonable path selection, in the context that 47787 is likely a 3356 customer.

That's -why- 3356 chooses the paths. 40676 and 47787 are customers,
1299 is a peer. You're telling me with a straight face that you think
that's *reasonable* routing?

That may turn into a game of whack a mole, but the knobs appear to be there to try something other than prepending to influence 3356’s selection.

Whack-a-mole is not a reasonable solution to anything.

Besides, I don't want to drop the path to 53356 via 47787. If the path
through 20473 fails, the path through 53356 is the next best and I
want Centurylink to use it.

Regards,
Bill Herrin

Nick_Hilliard3 · January 22, 2024, 9:54pm

At which point Centurylink chooses 40676 7489 11875 11875 11875
11875 11875 11875 11875.

[...]

You're telling me with a straight face that you think that's*reasonable* routing?

yep, looks pretty reasonable, if you're Centurylink and 40676 is a Centurylink customer.

Besides, I don't want to drop the path to53356 via 47787. If the path
through 20473 fails, the path through 53356 is the next best and I want Centurylink to use it.

You have your own ASN, you have control over your own routing policy. Centurylink probably aren't going to be interested in engaging with you if you're not a customer. It's a pickle.

Nick

William_Herrin · January 22, 2024, 10:03pm

It's not a pickle for me. I'll announce three prefixes instead of one,
and you get to pay for the extra two TCAM slots.

It offends my pride to handle it this way, but -you- shoulder the cost.

Regards,
Bill Herrin

Owen_DeLong · January 22, 2024, 10:39pm

And now you are faced with an object lesson as to why TE routes are so prevalent.

Less specifics are your only functional alternative here. In most cases, you shouldn’t need more than 2 per prefix.

Owen

Owen_DeLong · January 22, 2024, 10:43pm

I’d bet that 47787 is a paying century link customer. As such, despite the ugliness of the path, CL probably local prefs everything advertised by them higher than any non-paying link. I’m willing to bet 1299 is peered and not paying CL.

Sending bits for revenue is almost always preferable to sending bits for free, so…

Owen

William_Herrin · January 22, 2024, 11:43pm

Hi Alex,

Every packet has two customers: the one sending it and the one
receiving it. 3356 is providing a service to its customers. ALL of its
customers. Not just 47787. Sending the packet an extra 5,000 miles
harms every one of 3356's customers -except for- 47787.

In this case, I am the customer on both ends. 3356's choice to route
my packet via 47787 serves me poorly.

Regards,
Bill Herrin

Tom_Beecher · January 23, 2024, 12:07am

I’d bet that 47787 is a paying century link customer. As such, despite the ugliness of the path, CL probably local prefs everything advertised by them higher than any non-paying link. I’m willing to bet 1299 is peered and not paying CL.

It’s almost as if you’ve done this before.

Community : 3356:3 3356:22 3356:100 ==> 3356:123 <++ 3356:575
3356:903 3356:2011 3356:11918 47787:1020
47787:3090 47787:3690 47787:30000
Cluster : 0.0.7.15 0.0.7.19
Originator Id : 4.69.181.14 Peer Router Id : 4.69.130.10
Fwd Class : None Priority : None
Flags : Used Valid Best IGP Group-Best
Route Source : Internal
AS-Path : 47787 47787 47787 47787 53356 11875 11875 11875

3356:123 = Customer

William_Herrin · January 23, 2024, 12:25am

> Every packet has two customers: the one sending it and the one
> receiving it. 3356 is providing a service to its customers. ALL of its
> customers. Not just 47787. Sending the packet an extra 5,000 miles
> harms every one of 3356's customers -except for- 47787.
>
> In this case, I am the customer on both ends. 3356's choice to route
> my packet via 47787 serves me poorly.

Packets don't have customers, ISPs do. And in this case you're not a customer of the ISP making the routing decision

Incorrect. I am a customer of 3356. A residential customer, not a BGP
customer. I'm paying them to route my packets too, and they're routing
them poorly.

Also incorrect: every packet in your network is linked to either one
or two customers. Never more. Never less. Routing my packet via 47787
in this case serves neither of us: my Internet access is severely
degraded and 47787 is charged money for a packet they need not have
handled.

Charging your customers to make their service worse doesn't seem like
a good business model to me, but maybe that's why I'm not a CEO.

Fact is that all prepending does it provide a vague hint to other
networks about what you would like them to do.

Until they tamper with it using localpref, BGP's default behavior with
prepends does exactly the right thing, at least in my situation.

Regards,
Bill Herrin