RE: Newbies Question: Do I really need to sacrifice Prefix-aggregation to do BGP Load-sharing?

Dear all,

Before all else:
thank you all for the lightning-fast responses (even taking the time zone advantage into account).
I really, really, really appreciate all your recommendations.

Virtually all of you recommend prepending as the first choice.
I also get the feeling that you guys consider de-aggregation “distasteful” (at the least) but sometimes unavoidable.

I have considered the prepending myself, but dare not implement it yet
for the fear that BGP (Human) Community will burn me alive, witch-hunt style,
because of the following reasons:

  1. I can see from looking glass(es) that my upstreams already practice prepending (some paths) at their level (at least 3 more hops [x4]), supposedly to “balance” their bandwidth.
  2. Should I start prepending mine, I might upset their balance, causing them to prepend more, thus starting a “prepend war”. [I imagine that x20+ prepending starts out this way]

The way I see it, prepending (or maybe even the whole BGP-Path thing) is a local-optimization problem: it’s only best for someone, not globally.
And the Higher-Tiers (Lower Tier-Numbers) will always “engineer” me in the end.

Worse yet, I might be out-voted by de-aggregation insider “cultists” anyway.

Which forces me to proactively ask you guys questions about ROV-Overlapping and ROV “Hijack Gap” soon, in another posting with separate “Subject:”.

Again, Thank you.

Cheers,

Pirawat.

P.S. [Off-Topic] Any comment on the “SCION” System?
Any good (I will even take “academically”)?
[Reference: https://scion-architecture.net/]

If your Upstream(Transit provider) prepends your routes without you asking or authorizing it to do so, you should SERIOUSLY consider switching providers!

In the other email I talked about traffic engineering BGP communities.
If those prepends were made from some community you were applying… OK, that’s great!
Even better if you could apply a community that did something like “apply 2 prepends for south america only”.

But a Transit Provider changing the AS-PATH (in addition to the mandatory hop) arbitrarily without your consent is not for good people.

P.S. Your email replies are breaking threads in email readers. I suggest you review the email client tool.

  1. Prepending by itself isn’t bad. Prepending past the point that it is effective in accomplishing anything is what you generally want to avoid. Even then, it’s not nearly as big a deal as some make it out to be in most cases.

  2. De-aggregation has it’s uses and it’s place. Have a /20 , but announcing all the component /24s, even though you aren’t doing anything different with any of those? Bad practice. You’re just polluting the global table size for no good reason. However, perhaps you have a set of hosts in a single /24 that you want to try and protect from a prefix hijack. Announce the /20 and that singe /24. Not perfect protection , but provides some cover, and isn’t that big a deal.

The answers to all of these questions are really : “It depends on what you are trying to do.” There are generally accepted solutions to certain problems, and there are plenty of dumb solutions that are the only thing possible due to circumstances, so sometimes that’s what you have to do too.

Don’t worry about the pitchforks so much. :slight_smile:

To me, it's somewhat comical to see routes prepended 10-20 or more times. If one or two prepends doesn't do it, 10-20 isn't likely to either.

AFAIK, it's pretty common to use localpref to prefer peering (free) routes over transit (paid paths), and in cases where remote networks see your prepended path via peering, "no amount" of prepends is going change the fact that they prefer the free path.

While writing this though, two things occurred to me.

1) Are there any networks with routing policy that looks at prepends and
    says "if we see a peering path with >X number of prepends (or maybe
    just path length >X), demote the localpref to transit or lower"? "i.e.
    They obviously don't want us using this path, turn it into a backup
    path."

2) Particularly back when it was found some BGP implementations broke when
    encountering unusually long as-paths, I think it became somewhat common
    to reject routes with "crazy long" as-paths. If such policy is still
    in place in many networks, excessive prepending would actually have the
    desired effect for those networks. i.e. The excessive prepends would
    get that path rejected, keeping it from being used.

Hi Pirawat,

You asked the experts how it's done. It's done with prepends. Do you
really want to argue with the answer?

De-aggregation is a last resort, the bluntest tool in the toolchest.
And it costs other people money so they don't appreciate you doing it
unless you absolutely have to.
https://bill.herrin.us/network/bgpcost.html

As others have said, no one is going to yell at you because you
prepended your AS two or three or even five times. If you don't get
the desired effect after 5, you're running up against a problem
prepends won't solve. The typical problem is that your upstream has
used "localprefs" to prefer a particular path to you, overriding AS
path length as the deciding factor. Competent upstreams that employ
this technique also allow you to set a "BGP community" on your
advertisement that overrides this behavior. A "BGP Community" is a
32-bit number often expressed as two 16-bit numbers the first of which
is the ISP's AS number. When detected by the router, the number causes
it to apply some locally-chosen rule to the route. If you ask them,
the ISP will provide you with a list of "BGP Communities" (numbers)
they allow you to set on your route advertisement along with what
action they will take if they see that number.

Regards,
Bill Herrin

Reading between the lines this network’s current lack of diverse providers is consistent with a geographic/monopoly disadvantage. I do agree that your transit provider is in bad form to pad your routes, but it does happen. A phone call or email to understand their limitations may be helpful. Trying to fit all of your traffic into an upstream’s own uplink that is far to small does not provide the best user experience. It could be an bug in the route-map. Speaking of bugs, trying to use communities can cause you to observe bugs in other network’s route-maps (with great power comes great…).

Padding much past three usually has little affect. Splitting your advertisement into say four smaller announcements and starting to advertise them one at a time through your preferred provider is a good place to start. Traffic will prefer the more specific route. With luck that was done last night :blush:

Once you have balanced this out somewhat, you have bought yourself time. Next fun thing is to understand how this works when one provider fails or similar. Traffic can prefer the oldest route, so a small bump down the road can cause unanticipated traffic changes the next nightly peak. Or to put it another way, this is how the sausage is made.

P.S. Both of us top posting is also bad form.

  1. Are there any networks with routing policy that looks at prepends and
    says “if we see a peering path with >X number of prepends (or maybe
    just path length >X), demote the localpref to transit or lower”? “i.e.
    They obviously don’t want us using this path, turn it into a backup
    path.”

Yes. At a previous job, this is exactly what I did. If the path length was X or longer, set localpref to our last resort value. If path length was Y or longer, then I dropped completely, and at that point following defaults was just as good. Maybe once I hit something that caused a performance problem , but an email to that AS was all it took to fix ; they didn’t realize they were prepending that much and corrected it.

I have firsthand knowledge of some other networks that do similar things.

At a previous job, I explicitly crafted policies that were structured such that:

if PREFIXLENGTH > MAXPREFIXLENGTH then reject
if ASPATH > MAXASPATH then reject
strip_internal_communities
if ASPATH > MAX_VALID_PATH then
set localpref = TRANSIT_DEPREF_LOCALPREF
set communities DEPREF_TRANSIT
blah blah blah
if match external_signal_communities then
set localpref
set internal propagation communities
set external propagation communities
blah blah blah
then accept

that way, if the prefix size is too small, or the aspath is too long (>100),
it gets dropped before even bothering to evaluate communities; save
every bit of CPU and memory you can.
Then, strip your internal communities off everything else that’s a reasonable
path length;
set a lower threshold for what you consider a “reasonable” internet diameter
to be, including a reasonable 3x prepend at one or two levels; if it’s longer than
that, it’s a backup path at best, treat it that way (below standard transit level)
finally, on all the remaining routes, evaluate your external signalling communities,
and apply internal signalling communities as appropriate, and process normally.

There’s a clear tradeoff between trying to ensure maximum reachability
to the rest of the internet versus protecting your CPU and memory from
unnecessary work and state-keeping. As mentioned in another thread,
what each network decides the MAXPREFIXLENGTH is will depend on
their relationships and the capabilities of their hardware. It doesn’t necessarily
have to be /24 and /48, but it should be set at the longest value your network
can happily support, unless you want to chase down odd connectivity issues
in other people’s networks. ^_^;

Thanks!

Matt