Smaller than a /24 for BGP?

Justin_Wilson1 · January 24, 2023, 6:19pm

Have there been talks about the best practices to accept things smaller than a /24? I qm seeing more and more scenarios where folks need to participate in BGP but they do not need a full /24 of space. Seems wasteful. I know this would bloat the routing table immensely. I know of several folks who could split their /24 into /25s across a few regions and still have plenty of IP space.

Justin Wilson
j2sw@j2sw.com

Ian_Chilton · January 24, 2023, 6:35pm

Hi,

William_Herrin · January 24, 2023, 6:51pm

Hi Justin,

The short version is: it could happen but it won't. There's no
technical obstacle. It's purely administrative. Tens of thousands of
organizations would have to agree to accept smaller prefixes.That
would only happen if there was something in it for them to start doing
so, something major. And even then it would be a hard lift. There
isn't. Some of those organizations run BGP set up by the last guy, the
current gut doesn't really grok it, and he certainly doesn't subscribe
to any information channels where it's discussed. So it's not going to
happen.

Regards,
Bill Herrin

jlewis · January 24, 2023, 7:04pm

The "other problem" is, every day more gear receiving full routes gets closer to (or farther past) the point where the resources to hold either the FIB or RIB just aren't there. For those using these devices, lowering the bar and bringing in another 100k or so routes in short order just isn't an option. /24 has been the de facto threshold for routes in the v4 table long enough that I wouldn't want to be dependent on that changing.

William_Herrin · January 24, 2023, 7:17pm

Yeah, but in another couple years we'll breach the 1M mark and
everybody will have fresh routers with lots of TCAM for a while. If
that were the only issue, it'd be a matter of timing the change well.

Regards,
Bill Herrin

jlewis · January 24, 2023, 7:29pm

Everybody will need them. Not all will get (or be able to get) them.

Chris_J_Ruschmann · January 25, 2023, 12:10am

How do you plan on getting rid of all the filters that don’t accept anything less than a /24?

In all seriousness If I have these, I’d imagine everyone else does too.

Robert_McKay · January 25, 2023, 2:55am

Allow someone to advertise a covering /24 (and route onward to the longer prefixes) in exchange for being able to 'research' the traffic?

You do want a cover advertisement, but it will hardly ever be used by anyone. It attracts traffic, but before that traffic gets anywhere near the origin it hits the more specific longer prefixes and goes straight there.. probably via your immediate upstream, which it would have anyway.

There are large sections of space that have historically always had cover advertisements - 38.0.0.0/8 for instance. If cogent started accepting /29's in there they'd work perfectly courtesy of 38/8 - with nobody else making any other changes to anything.

Probably before long the other transit ISPs would start accepting the longer prefixes too and you'd have fairly functional multi-homing. The long tail of other ISPs doesn't even matter since they inevitably hand the traffic over to someone who will know what to do with it.

-Rob

Masataka_Ohta · January 25, 2023, 2:57am

Jon Lewis wrote:

Yeah, but in another couple years we'll breach the 1M mark and
everybody will have fresh routers with lots of TCAM for a while. If
that were the only issue, it'd be a matter of timing the change well.

Everybody will need them. Not all will get (or be able to get) them.

Wrong. For /24, direct look up of 16M entry SRAM is enough.
Updating 64K entries for /8 should not be a problem, though
you may also have 64K entry SRAM for /16.

In addition, for small number of local smaller-than-/24
prefixes, another lookup of radix tree by a smaller SRAM
(with 64K entry, we can subdivide 256 /24 into /32)
should be possible.

But, there is no need for costly and power wasting TCAM.

So far, I ignore IPv6, of course.

Masataka Ohta

John_L · January 25, 2023, 3:53am

It appears that Chris J. Ruschmann <chris@scsalaska.net> said:

-=-=-=-=-=-
How do you plan on getting rid of all the filters that don’t accept anything less than a /24?

In all seriousness If I have these, I’d imagine everyone else does too.

Right. Since the Internet has no settlements, there is no way to
persuade a network of whom you are not a customer to accept your
announcements if they don't want to, and even for the largest
networks, that is 99% of the other networks in the world. So no,
they're not going to accept your /25 no matter how deeply you believe
that they should.

I'm kind of surprised that we haven't seen pushback against sloppily
disaggregated announcements. It is my impression that the route table
would be appreciably smaller if a few networks combined adjacent a
bunch of /24's into larger blocks.

R's,
John

Forrest_Christian_Li · January 25, 2023, 4:12am

I have two thoughts in relation to this:

It’s amazing how many threads end up ending in the (correct) summary that making an even minor global change to the way the internet works and/or is configured to enable some potentially useful feature isn’t likely to happen.
I’d really like to be able to tag a BGP announcement with “only use this announcement as an absolute last resort” so I don’t have to break my prefixes in half in those cases where I have a backup path that needs to only be used as a last resort. (Today each prefix I have to do this with results in 3 prefixes in the table where one would do).

And yes. I know #2 is precluded from actually ever happening because of #1. The irony is not lost on me.

Lars_Prehn1 · January 25, 2023, 5:08am

We performed some high-level analyses on these hyper-specific prefixes about a year ago and pushed some insights into a blog post [1] and a paper [2].

While not many ASes redistribute these prefixes, some accept and use them for their internal routing (e.g., NTT’s IPv4 filtering policy [3]). Rob already pointed out that this is often sufficient for many traffic engineering tasks. In the remaining scenarios, announcing a covering /24 and hyper-specific prefixes may result in some traffic engineering, even if the predictability of the routing impact is closer to path prepending than usual more-specific announcements. In contrast to John’s claim, some transit ASes explicitly enabled redistributions of up to /28s for their customers upon request (at least, they told us so during interviews).

Accepting and globally redistributing all hyper-specifics increases the routing table size by >100K routes (according to what route collectors see). There are also about 2-4 de-aggregation events every year in which some origin (accidentally) leaks some large number of hyper-specifics to its neighbours for a short time.

Best regards,
Lars

[1] https://blog.apnic.net/2022/09/01/measuring-hyper-specific-prefixes-in-the-wild/
[2] https://dl.acm.org/doi/pdf/10.1145/3544912.3544916
[3] https://www.gin.ntt.net/support-center/policies-procedures/routing/

Chris11 · January 25, 2023, 8:24pm

I would suggest that this is trying to solve the wrong problem. To me this is pressure to migrate to v6, not alter routing rules.

Kind Regards,
Chris Haun

Eric_Kuhnke · January 25, 2023, 10:31pm

It’s amazing how many threads end up ending in the (correct) summary that making an even minor global change to the way the internet works and/or is configured to enable some potentially useful feature isn’t likely to happen.

My biggest take-away from this is that software and network engineering design decisions should be more thoughtful and methodical when setting address space, number space, name space and size/expandability of whatever is being configured when designing new things. Even if you think whatever you’ve created is inexhaustible for your own purposes. Once something has been put into widespread use it’s extremely difficult to come back and fix it later.

Such as for ISP internal purposes, like thinking about “okay what if we take this DNS zone delegation for our internal management network and set it aside for a vast number of CPEs in the future, hierarchically organized by where they’re going to be installed geographically, for our internal hostnames and reverse DNS”.

I’m sure that the vast global address space of ipv4 looked incredibly large when put into use as a standard…

Or if you’ve ever seen an organization that internally set up its accounting/billing/customer circuit ID system with a namespace/number-space that didn’t scale to meet future needs, or categorization of customers, or integration of circuit IDs into automation systems.

jlewis · January 25, 2023, 11:39pm

With good upstreams (providing useful BGP communities support), you can do this without cluttering the global table.

Say you have multiple upstreams, and you want provider A to treat a.b.c/23 as a "don't use this unless you have no other path" route. They should support a community that allows you to cause them to set localpref lower than their "peer default" (or transit if they admit to buying transit). Then, when you advertise a.b.c/23 to both/all your providers, provider A won't use the direct route unless they're not receiving it from any other source. i.e. You don't have to advertise the /23 to A and a pair of /24s to B, C, etc. to make sure A doesn't use the direct customer route.

I don't know that all "tier 1's" support it...but for example, Tata, GTT, Cogent, and NTT do:

Tata:
Request for Local Preference Adjustment In AS6453 the default local preference (LOCAL_PREF) value for customer-routes is 100, and for peer-routes it is 90. Along the lines of RFC1998, a Tata Communications customer may request other than thedefault local preference:
Adjust Local Preference
community action
6453:n, n={70, 80, 90, 110} assign local preferencen in AS6453

GTT:
3.2.2 Local Preference

  Value Description
  3257:1980 give routes localpref below normal customer route.
  3257:1970 give routes localpref below normal peer route.

Cogent:
Local Preference
All customer routes announced to Cogent will have a local pref of 130.
The customer can control the local preference for their announcements by using a community string that is passed to Cogent in the BGP session. The following table lists the community strings and the corresponding local preference that will be set when they are used.

Community String Local Pref Effect
174:10 10 Set customer route local preference to 10
(below everything-least preferred)
174:70 70 Set customer route local preference to 70
(below peers)
174:120 120 Set customer route local preference to 120
(below customer default)
174:125 125 Set customer route local preference to 125
(below customer default)
174:135 135 Set customer route local preference to 135
(above customer default)
174:140 140 Set customer route local preference to 140
(above customer default)

You get the idea. Everyone likely does it "their own way", so you need to find the BGP community support info for the upstream with which you want non-default behavior / localpref.

Mike_Hammett · January 26, 2023, 1:52pm

Implementing v6 is important, but unrelated to allowing smaller v4 prefixes.

Not taking a position either way if smaller v4 prefixes is good or bad.

Masataka_Ohta · January 28, 2023, 5:49am

Lars Prehn wrote:

Accepting and globally redistributing all hyper-specifics increases
the routing table size by >100K routes (according to what route
collectors see).

That figure is guaranteed minimum but there should be 10 or
100 times more desire for hyper-specifics suppressed by
the established (since early days with class C) practice.

That multihomed sites are relying on the entire Internet
for computation of the best ways to reach them is not
healthy way of multihoming.

Masataka Ohta

William_Herrin · January 28, 2023, 3:05pm

This was studied in the IRTF RRG about a decade ago. There aren't any
other workable ways of multihoming compatible with the TCP protocol,
not even in theory. Every other mechanism imagined failed some basic
system constraint, usually the requirement that packets have
administrative permission to cross an intermediate network. So,
another way of multihoming critically depends on replacing the layer-4
protocols with something that doesn't intermingle the IP address with
the connection identifier.

For clarity: TCP's connection identifier consists of the source and
destination IP addresses plus the source and destination ports. Those
four elements, unique when combined, identify exactly one ongoing TCP
connection. Because of this, the connection must fail if the source or
destination IP addresses are no longer available to the source or
destination hosts. From this fact, we get the requirement that the
entire Internet learn when a particular IP address has changed its
position within the network.

Regards,
Bill Herrin

Donald_Eastlake · January 28, 2023, 6:14pm

Use Multipath TCP
https://datatracker.ietf.org/group/mptcp/documents/

William_Herrin · January 28, 2023, 7:24pm

Doesn't work well. Has security problems (mismatch between reported IP
addresses used and actual addresses in use) and it can't reacquire the
opposing endpoint if an address is lost before a new one is
communicated.

MPTCP has been complete for years. The adoption rate is very low.

QUIC is better, but it still leaves finding the server's new IP
address as an exercise for a process outside of the protocol. I
haven't kept my ear to the ground for the last year or two but I
haven't heard about it making the expected inroads versus HTTP 1.1
over TCP. Unfortunately, QUIC is a very complex protocol that's very
hard to troubleshoot. The complexity comes from a slew of mandatory
security components which should have been optional.

Regards,
Bill Herrin