Communities

Hi,

I'm looking for information about the way networks use communities in BGP.

It seems that many of the larger networks only use communities to supply
their customers with a mechanism to adjust the local preference to
indicate which connection is preferred when a customer connects over more
than one link (something that can also be done with the MED).

How many networks are there that use communities to indicate where (which
interconnect point) a route was learned? And how many networks use this
information if their upstream provides it?

And how about things like congestion?

Is there any need for more "well known" communities?

TIA,

Iljitsch van Beijnum

Date: Mon, 15 Oct 2001 11:55:27 +0200 (CEST)
From: Iljitsch van Beijnum <iljitsch@muada.com>

I'm looking for information about the way networks use
communities in BGP.

It seems that many of the larger networks only use communities
to supply their customers with a mechanism to adjust the local
preference to indicate which connection is preferred when a
customer connects over more than one link (something that can
also be done with the MED).

Remember that local-pref has higher priority than as-path
length; MED is the lowest priority before router ID.

For instance, I match "_asnthatIdontlike_" and penalize
local-pref to [try to] avoid routing traffic over an ASN that
I think has poor performance. If I penalize AS65000, then

  me 3549 65500 65432 65432 65432 65123

will be preferred over

  me 6347 65000 65123

This is one reason that redistributing one's upstream routes via
BGP can be bad despite as-path length: If someone uses
local-pref, it's quite conceivable that one will take the
erroneous path that some edge idiot[1] leaked into the table.

[1] I'm an edge-dweller. I can insult them. Note, however, that
upstreams _should_ filter their downstreams to prevent improper
adverts... but the root of the problem is the one at the edge.

[ snip ]

And how about things like congestion?

How do you mean?

Is there any need for more "well known" communities?

I wish that providers would set a community indicating route
ingress. I know, for instance, that GBLX does this... but their
system with hundreds of communites leaves some to be desired,
IMHO.

I'd like to see providers tag "route learned in this region" at
various granularity levels.

As for providers listening to communities, I like selective
as-path padding... I'd have to dig up the thread, but this has
been discussed in the past few months.

Eddy

How many networks are there that use communities to indicate where (which
interconnect point) a route was learned?

How feasible is it for me to provide this information in any
meaningful way if I have tens or even hundreds of interconnect
points in my network? Obviously I can assign a unique community
to each such point on my network, and tag all routes I learn there
with that community, but is the benefit of my doing so? Unless
you have some way of knowing whether interconnect point A is "better"
than interconnect point B, how would you use that information?

This isn't to say that there isn't a reason to do this. I can think
of several *internal* uses for such a scheme, including distance-
sensitive billing applications, traffic engineering, etc. But is
there a benefit to revealing this information to customers or peers?

And how many networks use this information if their upstream provides it?

Without having a clear understanding of each upstream's network
topology and routing policy, how would you use such information to
label one route as "better" than another?

What problem(s) are you trying to solve, and are you sure that
BGP communities are the right tool for the job?

--Jeff

Date: Mon, 15 Oct 2001 12:34:24 -0400
From: Jeff Aitken <jaitken@aitken.com>

How feasible is it for me to provide this information in any

[ snip ]

This isn't to say that there isn't a reason to do this. I can think

[ snip ]

Without having a clear understanding of each upstream's network
topology and routing policy, how would you use such information to
label one route as "better" than another?

Let's take a simple example. Say that I connect to AS65123 in
DFW and AS65456 in Chicago. Assume that both ASen have similar
peering with other networks.

Now, using only as-path length, where do I send traffic? Is
as-path length the best metric? No. If I need traffic headed
for MSP, it should go through CHI. If I need traffic to go to
Houston, it should be routed through Dallas. How does one do
this now? Static entries based on RADB or similar?! If that's
acceptable, then why don't we just static route, period?!

Real example:

If I'm a 6347 downstream and I know that 6347 has transit via
701, 1239, 3561 near me, I'm going to use a route-map. That's
easy.

Now let's take 3967... in most places, peering with 6347 seems
better than with 3549. I send 3967 traffic via 6347. But it's
not perfect... I'd rather send certain regions via 3549. Without
regional tagging, how do I do that? Hypothetical example made
into real example.

Furthermore, define "clear understanding". If I test different
traffic paths, I can get a pretty clear understanding. Not as
good as a detailed network map, but enough to tune routes better
than leaving them up to nature.

What problem(s) are you trying to solve, and are you sure that

See above.

BGP communities are the right tool for the job?

Sure that they're the right tool, no. Sure that they're the best
tool -- until someone shows me a better one.

Eddy

First, thanks to everyone who replied.

For instance, I match "_asnthatIdontlike_" and penalize
local-pref to [try to] avoid routing traffic over an ASN that
I think has poor performance. If I penalize AS65000, then

  me 3549 65500 65432 65432 65432 65123

will be preferred over

  me 6347 65000 65123

This is one reason that redistributing one's upstream routes via
BGP can be bad despite as-path length: If someone uses
local-pref, it's quite conceivable that one will take the
erroneous path that some edge idiot[1] leaked into the table.

I don't understand what you mean. Redistributing upstream routes
where/into what? How can this be "despite" as-path length?

> And how about things like congestion?

How do you mean?

Well, let me provide a real-world example. It's not really congestion, but
close enough for these purposes.

When Telehouse had problems in Manhattan after the attacks on September
11, one of our transit networks issued a warning that they might lose lots
or routes if the power would go down (which seemed likely at the time).
Since they use ATM, the BGP session to the router at the other side of the
connection would have to time out if this happened, creating a temporary
black hole. For us, the lost routes wouldn't be a problem, since we
multihome. But a black hole is a multihomer's worst nightmare.

So a community that indicates "you don't want to use this route unless you
absolutely have to--trust us" would have been very welcome. Such a
community would be especially useful in the face of congestion:
multihomers can route around the congested area and since some traffic is
rerouted the congestion would be less for the traffic that remains.

> Is there any need for more "well known" communities?

I wish that providers would set a community indicating route
ingress. I know, for instance, that GBLX does this... but their
system with hundreds of communites leaves some to be desired,
IMHO.

I'd like to see providers tag "route learned in this region" at
various granularity levels.

I would love to be able to see where a route originated and how bad the
detour getting here was.

But is it worth the trouble to try to "standardize" communities for this?

Iljitsch

Date: Mon, 15 Oct 2001 19:56:02 +0200 (CEST)
From: Iljitsch van Beijnum <iljitsch@muada.com>

(This is more like two messages in one... I'm posting as a
single message in sort of a "self digest" mode.)

[ snip ]

*** Message #1 ***

I don't understand what you mean. Redistributing upstream
routes where/into what? How can this be "despite" as-path
length?

Hypothetical example with real names:

Let's say that I have transit from 6347 and 2914. Now let's say
that I'm stupid, and start advertising routes that I learn from
2914 into 6347, and that 6347 isn't filtering my as-paths or
netblocks. [Note: 6347 does know better in the real world.]

Now a customer ("Network X") of 6347 and 1239 will see 2914
netblocks via

  6347 19358 2914
  6347 { 701 | 1239 | 3561 } 2914
  1239 2914

assuming that:

+ 1239/2914 directly connect
+ 6347/2914 do not directly connect
+ 6347 obtains transit to 2914 via 701, 1239, and 3561.

6347 learns 2914 routes from 701; 1239; 3561; and (wrongly) me,
19358... then chooses a best route to redistribute. Because 6347
sells transit to me, they'll give my routes higher local-pref
than their peers or upstreams. Thus, for any 2914 netblock, I
become the preferred egress from 6347. Problem #1.

Now lets say that Network X uses local-pref to penalize

  _1239_.*_2914

Network X sees:

  6347 19358 2914
  1239 2914

Network X's local-pref policies in their route-maps makes the
latter one undesirable. Problem #2, and the [extreme] example
in my prior post.

Some old-timers help me out: IIRC, 3561 got blackholed in 1997
by bad BGP from another well-known network... but I don't want
to say more in case my memory is bad.

*** Message #2 ***

Well, let me provide a real-world example. It's not really
congestion, but close enough for these purposes.

When Telehouse had problems in Manhattan after the attacks on

[ snip ]

So a community that indicates "you don't want to use this route
unless you absolutely have to--trust us" would have been very
welcome. Such a community would be especially useful in the
face of congestion:

I see and agree. Good idea, IMHO.

But is it worth the trouble to try to "standardize" communities
for this?

I should think that this would be trivial. 0x0000:* and 0xffff:*
are reserved per RFC1997... release a new RFC with your "you
don't want this route!" communities added, participants would
benefit, non-participants would observe no change, and there
would be no interoperability troubles.

I think I like this better than my prior geography-based post...
you're suggesting that MED-like info be advertised via standard
communities. And who would know better than the originating
provider? Makes sense to me...

Eddy

Hypothetical example with real names:

Let's say that I have transit from 6347 and 2914. Now let's say
that I'm stupid, and start advertising routes that I learn from
2914 into 6347, and that 6347 isn't filtering my as-paths or
netblocks. [Note: 6347 does know better in the real world.]

Gee, this is already something that can easily be solved - route-maps are
your friends. The moment you do something like this you *will* get filtered.

Now a customer ("Network X") of 6347 and 1239 will see 2914
netblocks via

  6347 19358 2914
  6347 { 701 | 1239 | 3561 } 2914
  1239 2914

assuming that:

+ 1239/2914 directly connect
+ 6347/2914 do not directly connect
+ 6347 obtains transit to 2914 via 701, 1239, and 3561.

6347 learns 2914 routes from 701; 1239; 3561; and (wrongly) me,
19358... then chooses a best route to redistribute. Because 6347
sells transit to me, they'll give my routes higher local-pref
than their peers or upstreams. Thus, for any 2914 netblock, I
become the preferred egress from 6347. Problem #1.

You are missing a few little things - if 6347 does not filter and you
redistribute 2914 routes to 6347, you will redistribute entire view of the
world from perspective of 2914, since 2914 if your upstream provider as
well. Since 6347 prefers your routes, you will become exit point for all
non-customer traffic of 6347, which is going to be immediately detected.

All of this of course is exercise in typing since everyone sane has some
knobs that they set to make sure that their customers do not blow up their
entire network.

Now lets say that Network X uses local-pref to penalize

  _1239_.*_2914

Network X sees:

  6347 19358 2914
  1239 2914

Network X's local-pref policies in their route-maps makes the
latter one undesirable. Problem #2, and the [extreme] example
in my prior post.

Some old-timers help me out: IIRC, 3561 got blackholed in 1997
by bad BGP from another well-known network... but I don't want
to say more in case my memory is bad.

7007 problem was different. The issue was that 7007 redistributed EGP into
classful IGP, which got redistributed back into IGP, which of course broke
AS_PATH loop detection in addition to creating a set of higher specificity
routes.

Alex

7007 problem was different. The issue was that 7007 redistributed EGP into
classful IGP, which got redistributed back into IGP, which of course broke

                                        ^^^^^^^^^^^^^

Oops, back into EGP.

AS_PATH loop detection in addition to creating a set of higher specificity
routes.

Alex

Let's take a simple example. Say that I connect to AS65123 in
DFW and AS65456 in Chicago. Assume that both ASen have similar
peering with other networks.
[...]
If I need traffic headed for MSP, it should go through CHI.

Perhaps, but in order to be sure we need to know more. First of all,
we need to know who the third network is -- the upstream for the end
user in MSP. Let's assume it's AS65535. You've decided on geography
alone that it's better to hand MSP-destined traffic to AS65456 in ORD.
Do AS65456 and AS6535 even have peering in that area? What if the
only two places they peer are on the west and east coasts?

You also didn't identify the starting point on your network. Let's
assume it's Dallas. You have two choices:

    1. You carry the bits from DFW to ORD on your network and hand
       them to AS65456. AS65456 then carries them to wherever they
       peer with AS65535 and hand them over for final delivery.

    2. You hand off to AS65123 in DFW. AS65123 carries the bits to
       a location where they peer with AS65535, who takes them the
       rest of the way.

Without any knowledge of the topology, routing policy, backbone
capacity, and peering placement and density of the three networks
involved, how can you say for sure which option is "better"? I'd
be inclined to ask why you're paying AS65123 for service if you can
do a better job of carrying bits to MSP than they can, personally.

If I need traffic to go to Houston, it should be routed through
Dallas.

Not necessarily. See above. Where is the starting point on your
network? If it's Dallas, maybe. But what about from other points
on your network?

Furthermore, define "clear understanding".

Clear understanding means:

1. You need to know each providers' backbone. Just because two cities
are close together on a map or have fiber between them doesn't mean
that they're connected at layer 3.

2. You need to know each provider's routing policy. Just because two
networks peer here doesn't mean they won't exchange bits there.

3. You need to know where each provider has peering, and with whom.
Just because a provider has a POP in a given city doesn't mean that
they peer there. Just because two providers have routers in the same
room in the same building in the same city doesn't mean they peer
there.

If you don't have this information, then what you're doing is
guessing based on nothing more than geographical information. If
you're going to do that, how granular do you want the data? I
don't think city-level is a good idea -- too often you'll make the
wrong choice. Regional-level might work, but for what definition
of "region"?

As a provider, I'd rather hear from my customers that there is a
problem so that I can fix the root cause, rather than telling them
to hack around it.

--Jeff

> How many networks are there that use communities to indicate where (which
> interconnect point) a route was learned?

How feasible is it for me to provide this information in any
meaningful way if I have tens or even hundreds of interconnect
points in my network?

Hm, are there hundreds of interconnect points, even world wide?

Obviously I can assign a unique community
to each such point on my network, and tag all routes I learn there
with that community, but is the benefit of my doing so? Unless
you have some way of knowing whether interconnect point A is "better"
than interconnect point B, how would you use that information?

If two networks are both rather large and interconnect in many places, it
may be hard to put this information to use. But for multihoming customers
this shouldn't be a big problem. For instance, we are in Europe and we
assign a lower local preference to routes our upstreams receive in the US.
So if there is a route over an interconnect point in Europe, we prefer it,
regardless of AS path length. Obviously this will not guarantee selection
of the best path, but there are cases when it prevents a transcontinental
detour.

> And how many networks use this information if their upstream provides it?

Without having a clear understanding of each upstream's network
topology and routing policy, how would you use such information to
label one route as "better" than another?

Give your multihomed customers some credit. They know how the traceroute
program works. If part of you network or an exchange point is congested,
your customers will know. Why not give them the tools to route around the
problem?

What problem(s) are you trying to solve, and are you sure that
BGP communities are the right tool for the job?

The problem is that the BGP route selection algorithm is far from perfect.
Setting the local preference based on the AS path and communities is the
only tool (apart from a big bag of money that makes all the problems
disappear).

Iljitsch

Let's say that I have transit from 6347 and 2914. Now let's say
that I'm stupid, and start advertising routes that I learn from
2914 into 6347, and that 6347 isn't filtering my as-paths or
netblocks. [Note: 6347 does know better in the real world.]

Ok, I understand. There was a problem along these lines a few weeks ago.
"Sorry guys, a circuit came into service unexpectedly, we hadn't installed
any filters yet." (AS#s withheld to protect the guilty.)

But then the question is: which is worse, having traffic flow over an
inferior path, or taking the chance that two people who both should know
better screw up?

*** Message #2 ***

[ snip ]

> So a community that indicates "you don't want to use this route
> unless you absolutely have to--trust us" would have been very
> welcome. Such a community would be especially useful in the
> face of congestion:

I see and agree. Good idea, IMHO.

> But is it worth the trouble to try to "standardize" communities
> for this?

I should think that this would be trivial. 0x0000:* and 0xffff:*
are reserved per RFC1997... release a new RFC with your "you
don't want this route!" communities added, participants would
benefit, non-participants would observe no change, and there
would be no interoperability troubles.

Yes, why not. If anyone has something to contribute or wants to co-author
such a draft or RFC, contact me off-list.

I think I like this better than my prior geography-based post...
you're suggesting that MED-like info be advertised via standard
communities. And who would know better than the originating
provider? Makes sense to me...

I've been thinking about other information that could be conveyed in
communities. For instance, bandwidth, delay and packet loss. If each
router along the way modifies such a community (should probably be an
extended one) then a much richer set of information would be available to
multihomers to aid in route selection.

Iljitsch

* iljitsch@muada.com (Iljitsch van Beijnum) [Tue 16 Oct 2001, 10:48 CEST]:

I've been thinking about other information that could be conveyed in
communities. For instance, bandwidth, delay and packet loss. If each
router along the way modifies such a community (should probably be an
extended one) then a much richer set of information would be available
to multihomers to aid in route selection.

And generate a route flap every time a link gets used more or less?
That would be suboptimal to say the least (the word `countereffective'
seems more applicable to me).

  -- Niels.

Using dynamic data for this is not going to work in BGP, so this would
have to be static information (hm, packet loss is not too static,
hopefully).

Static system-derived or configured information would already help a lot.
You can then easily select the route with the highest potential bandwidth
or the lowest speed-of-light delay, without the need to know a lot about
the internals of a transit network.

Introducing "metrics" like this like this is not contrary to BGP design
philosophy: the way in which an AS selects the best route is not defined
in the RFC and the length of the AS path is certainly not the best
possible criterion.

The processing along the way would be limited to a simple addition
(delay), compare/replace (bandwidth) or multiplication (packet loss)
without introducing anything SPF-like.

Iljitsch

* iljitsch@muada.com (Iljitsch van Beijnum) [Tue 16 Oct 2001, 12:11 CEST]:

I've been thinking about other information that could be conveyed in
communities. For instance, bandwidth, delay and packet loss.

And generate a route flap every time a link gets used more or less?
That would be suboptimal to say the least (the word `countereffective'
seems more applicable to me).

Using dynamic data for this is not going to work in BGP, so this would
have to be static information (hm, packet loss is not too static,
hopefully).

Indeed.

Static system-derived or configured information would already help a lot.
You can then easily select the route with the highest potential bandwidth
or the lowest speed-of-light delay, without the need to know a lot about
the internals of a transit network.

Introducing "metrics" like this like this is not contrary to BGP design
philosophy: the way in which an AS selects the best route is not defined
in the RFC and the length of the AS path is certainly not the best
possible criterion.

Setting communities based on a prefix's entry point into an ASN is
doable with today's technologies (slight understatement). What's needed
besides a standard numbering scheme for those communities is a way in
all routers to route packets not merely destination-based but also based
on a community set by the customer advertising the prefix to its
upstream provider.

As already noted, currently communities are mostly used to control
advertisements of one's announcements by upstream providers, and not
for outbound routing, which

Example:

Customer A has a connection to upstream B and speaks BGP with B. B as
two different paths to C: one cheap and slow, one fast and expensive.
(This seems to be a business opportunity - devise lines that are both
cheap and fast.)

Now B can set communities on routes received from C based on where a
certain prefix was received. If they overlap, however, only the best
route out of the two will be passed on to customer A. If this obstacle
is overcome, A still faces the problem of getting B to discern between
packets meant for either exit point to C. B could reengineer its
network to basically exist of two separate entities (a cheap one and an
expensive one) and let customers like A to connect to both, or extend
all its routers to have a pre-prefix source+destination routing table
entry to decide where to send packets.

This seems to need quite some engineering work. :slight_smile:

Or A could buy B and do it themselves.

On a side note, A's possibilities of influencing inbound routing
decisions - given that B acts on communities set by A, like `Prepend own
ASN a few times before sending over just this link' or `Don't announce
to D at all' - are already technically possible. Frankly, if I were B
I'm not sure I'd be all that happy with customers influencing my routing
decision process. They hand me their packets (or not); that should be it.

Regards,

  -- Niels.

As already noted, currently communities are mostly used to control
advertisements of one's announcements by upstream providers, and not
for outbound routing,

I'm sure it's used more for the former than the latter, however, there are
networks that look at communities for outbound routing. A little more than
I expected, even. This seems to happen mostly at multihomed networks. For
instance we (AS12854) set a lower metric for routes that come in over a
certain exchange point and a lower local preference for routes learned
somewhere across the atlantic.

Customer A has a connection to upstream B and speaks BGP with B. B as
two different paths to C: one cheap and slow, one fast and expensive.
(This seems to be a business opportunity - devise lines that are both
cheap and fast.)

Well, lines used to be both expensive and slow, so at least there is
progress...

Now B can set communities on routes received from C based on where a
certain prefix was received. If they overlap, however, only the best
route out of the two will be passed on to customer A.

Yes, this is always the problem with BGP. If I like low delay, but my
upstream prefers a high bandwidth route that is also available for that
destination, I don't get to see that nice low delay route I would have
liked to use.

If this obstacle
is overcome, A still faces the problem of getting B to discern between
packets meant for either exit point to C. B could reengineer its
network to basically exist of two separate entities (a cheap one and an
expensive one) and let customers like A to connect to both, or extend
all its routers to have a pre-prefix source+destination routing table
entry to decide where to send packets.

This seems to need quite some engineering work. :slight_smile:

B could also do away with layer 3 and sell layer 2 (or layer 1)
connectivity to C, where each customer can select the appropriate quality
levels. Other options are for B to focus on one selling point and try to
optimize the network for that selling point, or use their expertise to
find the perfect middle ground, or run several parallel networks.

Date: Tue, 16 Oct 2001 13:28:41 +0200
From: Niels Bakker <niels=nanog@bakker.net>

(Too lazy^H^H^H^Hrushed to rewrap quoted lines <= 72 char)

[ snip ]

Customer A has a connection to upstream B and speaks BGP with B. B as
two different paths to C: one cheap and slow, one fast and expensive.
(This seems to be a business opportunity - devise lines that are both
cheap and fast.)

Now B can set communities on routes received from C based on where a
certain prefix was received. If they overlap, however, only the best
route out of the two will be passed on to customer A. If this obstacle
is overcome, A still faces the problem of getting B to discern between
packets meant for either exit point to C. B could reengineer its
network to basically exist of two separate entities (a cheap one and an
expensive one) and let customers like A to connect to both, or extend
all its routers to have a pre-prefix source+destination routing table
entry to decide where to send packets.

This seems to need quite some engineering work. :slight_smile:

I've thought about this before.

Let's say that I have a DS3 to 701 and a 4xDS1 to 7018. I might
sell a DS1 to NetX, and advert their routes[1] to both upstreams.
If I sell a 15Mbps frac-DS3 to NetZ, I'd better use my 7018
connection for backup only on their routes.

In short, we start looking at multiple FIBs. It's not really
that much more difficult; it's more of a scalability issue. I
know that Zebra can run multiple router processes, but I've not
played with this feature... perhaps that's a start.

Or, if you want to get ugly, you could have your upstreams speak
multihop EBGP selectively with your downstreams. *ducking and
running*

[1] Ignore issue of table fragmentation for now. That's another
thread...

Or A could buy B and do it themselves.

On a side note, A's possibilities of influencing inbound routing
decisions - given that B acts on communities set by A, like `Prepend own
ASN a few times before sending over just this link' or `Don't announce
to D at all' - are already technically possible. Frankly, if I were B

Correct. And a few upstreams allow this.

I'm not sure I'd be all that happy with customers influencing my routing
decision process. They hand me their packets (or not); that should be it.

I disagree. Let's say that you sell me transit, and purchase
yours from 701 and 1239. Would you complain if I fill my pipe to
you with traffic to/from 701? No. If I fill it with traffic
to/from 1329? No.

Why, then, would you complain if I set a community to _prefer_
701 over 1239 or vice-versa? By giving your downstreams fine-
grained tuning, you allow them to tinker for a system that they
like... and you don't reach the extreme cases that are possible
even without fine-grained tuning.

Eddy

* eddy+public+spam@noc.everquick.net (E.B. Dreger) [Tue 16 Oct 2001, 18:09 CEST]:

Let's say that I have a DS3 to 701 and a 4xDS1 to 7018. I might
sell a DS1 to NetX, and advert their routes[1] to both upstreams.
If I sell a 15Mbps frac-DS3 to NetZ, I'd better use my 7018
connection for backup only on their routes.

In an ideal world. Not sure how many networks engineer their external
connections so that each one equals the maximum amount of data sent out
on all of them, in case all except one fail...

In short, we start looking at multiple FIBs. It's not really
that much more difficult; it's more of a scalability issue. I
know that Zebra can run multiple router processes, but I've not
played with this feature... perhaps that's a start.

Zebra doesn't actually forward packets. Ciscos with newer IOS can do
this (12.0T and onwards) with different VRFs. I've seen companies who
have something like that in production; packets hit the same router a
few times in a row in a traceroute.

Or, if you want to get ugly, you could have your upstreams speak
multihop EBGP selectively with your downstreams. *ducking and
running*

The "less hassle" part of having a limited amount of upstream providers
to deal with certainly diminishes in this particular scenario, yes.

Frankly, if I were B I'm not sure I'd be all that happy with customers
influencing my routing decision process. They hand me their packets
(or not); that should be it.

I disagree. Let's say that you sell me transit, and purchase
yours from 701 and 1239. Would you complain if I fill my pipe to
you with traffic to/from 701? No. If I fill it with traffic
to/from 1329? No.

Yes, I would complain if you sent me packets with source addresses you
shouldn't be sourcing (i.e., not your own). Traffic from 701 or 1239
should not pass you to reach me (if I were B and you customer A).

Why, then, would you complain if I set a community to _prefer_
701 over 1239 or vice-versa? By giving your downstreams fine-
grained tuning, you allow them to tinker for a system that they
like... and you don't reach the extreme cases that are possible
even without fine-grained tuning.

This is about packets from the world via me to you, not from you to the
outside world. The case you just described already exists; I wrote so
before (albeit in a bit broken English).

The only routing decision customer A can force upon B is "Send packets
destined for these netblocks <here's a BGP announcement> to me," and
enforces this via a contract both parties enter in and A (presumably)
pays B for.

Regards,

  -- Niels.

> This seems to need quite some engineering work. :slight_smile:

I've thought about this before.

Let's say that I have a DS3 to 701 and a 4xDS1 to 7018. I might
sell a DS1 to NetX, and advert their routes[1] to both upstreams.
If I sell a 15Mbps frac-DS3 to NetZ, I'd better use my 7018
connection for backup only on their routes.

Router (A) policy 701
Router (B) policy 7018
Router (C) your policy ( 701 && 7018 && whatever else you have)

Where is the problem again? We, Netaxs, (AS4969) do this in multiple
locations with multiple OC-12s to different transit providers and our own
network where some customers want to always use a specific path or not use
any path at all, while the others do not want to be bothered about which
path can be used. That is all done today with Cisco and Juniper gear and
confederations, without need for any random changes to BGP protocol.

> On a side note, A's possibilities of influencing inbound routing
> decisions - given that B acts on communities set by A, like `Prepend own
> ASN a few times before sending over just this link' or `Don't announce
> to D at all' - are already technically possible. Frankly, if I were B

Correct. And a few upstreams allow this.

It is very simple to do. Create a set of 'advertise-me' communities and
'pad-me' communities.

Alex

Date: Tue, 16 Oct 2001 18:30:05 +0200
From: Niels Bakker <niels=nanog@bakker.net>

> In short, we start looking at multiple FIBs. It's not really
> that much more difficult; it's more of a scalability issue. I
> know that Zebra can run multiple router processes, but I've not
> played with this feature... perhaps that's a start.

Zebra doesn't actually forward packets. Ciscos with newer IOS can do

Correct. It edits the *ix kernel's FIB, adding and deleting
routes. However, Zebra running on a single machine can have
multiple BGP processes running... which is along the same lines.

this (12.0T and onwards) with different VRFs. I've seen companies who
have something like that in production; packets hit the same router a
few times in a row in a traceroute.

Interesting. I was unaware of this.

>> Frankly, if I were B I'm not sure I'd be all that happy with customers
>> influencing my routing decision process. They hand me their packets
>> (or not); that should be it.
> I disagree. Let's say that you sell me transit, and purchase
> yours from 701 and 1239. Would you complain if I fill my pipe to
> you with traffic to/from 701? No. If I fill it with traffic
> to/from 1329? No.

Yes, I would complain if you sent me packets with source addresses you
shouldn't be sourcing (i.e., not your own). Traffic from 701 or 1239
should not pass you to reach me (if I were B and you customer A).

Whoa! Where did I say spoofed packets? If 701 is one of your
upstreams or peers, then I can exchange traffic with 701 all day
long. I never indicated using improper source addresses. Please
reread my post.

  me <--> you <--> 701
  me <--> you <--> 1239

Both are valid.

> Why, then, would you complain if I set a community to _prefer_
> 701 over 1239 or vice-versa? By giving your downstreams fine-
> grained tuning, you allow them to tinker for a system that they
> like... and you don't reach the extreme cases that are possible
> even without fine-grained tuning.

This is about packets from the world via me to you, not from you to the
outside world. The case you just described already exists; I wrote so
before (albeit in a bit broken English).

The only routing decision customer A can force upon B is "Send packets
destined for these netblocks <here's a BGP announcement> to me," and

In your scenario. But this is arbitrary; it is not borne of
necessity due to the technology.

enforces this via a contract both parties enter in and A (presumably)
pays B for.

Let's say that I'm strictly a Web host. Inbound traffic is
negligible. I send any and all 701-bound traffic via you; any
and all other traffic goes through <some other upstreams>. No
complaint there -- and I can do this in your aforementioned
scheme.

Why do you balk at a community that says "I dislike 1239"[1],
thus _preferring_ 701, when I could simply route _all_ non-701
traffic over another one of my upstreams? IMHO, your dislike
of tuning is illogical... I can sway the balance _far_ more
with coarse-grained routing when you don't provide fine-grained
controls.

Not providing fine-grained tuning accomplishes nothing positive,
and can be a negative thing. Offering it provides benefit, and
is not difficult.[2]

[1] Reminder: Hypothetical example. Interpret accordingly. I
used 701 and 1239 in my original example, and don't care to
change the scenario.

[2] Yes, more maintenance with communities. But a few dozen is
all it takes to handle many ASen with a few different lengths...
both the initial effort and upkeep are negligible. Search the
archives for this discussion.

Eddy

* eddy+public+spam@noc.everquick.net (E.B. Dreger) [Tue 16 Oct 2001, 21:09 CEST]:

In short, we start looking at multiple FIBs. It's not really
that much more difficult; it's more of a scalability issue. I
know that Zebra can run multiple router processes, but I've not
played with this feature... perhaps that's a start.

Zebra doesn't actually forward packets. Ciscos with newer IOS can do

Correct. It edits the *ix kernel's FIB, adding and deleting
routes. However, Zebra running on a single machine can have
multiple BGP processes running... which is along the same lines.

Except that Zebra currently does not have any provisions to be able to
tell the forwarding engine it's running on (i.e. any Unix) a rule to the
effect of "If packets originate from this peer [this interface] and are
destined for this prefix, route them over that particular interface
instead of the interface that would've been taken for all packets from
all other prefixes." Which is, in effect, what multiple FIBs mean in
practice.

Frankly, if I were B I'm not sure I'd be all that happy with customers
influencing my routing decision process. They hand me their packets
(or not); that should be it.

I disagree. Let's say that you sell me transit, and purchase
yours from 701 and 1239. Would you complain if I fill my pipe to
you with traffic to/from 701? No. If I fill it with traffic
to/from 1329? No.

Yes, I would complain if you sent me packets with source addresses you
shouldn't be sourcing (i.e., not your own). Traffic from 701 or 1239
should not pass you to reach me (if I were B and you customer A).

Whoa! Where did I say spoofed packets? If 701 is one of your
upstreams or peers, then I can exchange traffic with 701 all day
long. I never indicated using improper source addresses. Please
reread my post.

Sorry, I misread you. Let me restate my previous statement before that
a bit then: Yes, I would mind them attempting to choose which exit point
into AS701 their packets would take. This could lead to suboptimal
performance for all B's customers and a loss of control over the bills
sent to B by its upstream providers. In addition to having to monitor
its own network for long-term bottlenecks B will have to stay on a
continuous alert for customers clogging one link.

  me <--> you <--> 701
  me <--> you <--> 1239

Both are valid.

Used above by me,
customer A <--> B <--> AS701 (West Buttmunch)
                  <--> AS701 (East Buttmunch)

(numbers hold of course no discernable relationship to reality)

Why, then, would you complain if I set a community to _prefer_
701 over 1239 or vice-versa? By giving your downstreams fine-
grained tuning, you allow them to tinker for a system that they
like... and you don't reach the extreme cases that are possible
even without fine-grained tuning.

This is about packets from the world via me to you, not from you to the
outside world. The case you just described already exists; I wrote so
before (albeit in a bit broken English).
The only routing decision customer A can force upon B is "Send packets
destined for these netblocks <here's a BGP announcement> to me," and

In your scenario. But this is arbitrary; it is not borne of
necessity due to the technology.

Actually, yes. The technology exists today for customer A to tell B to
announce A's prefixes only to some peers/upstream providers of B, but
not to route packets from A all via some peers/upstream providers of B
and not via the others, even though B would choose those routes for its
own packets (and thus has installed them into the FIBs of their routers).

enforces this via a contract both parties enter in and A (presumably)
pays B for.

Let's say that I'm strictly a Web host. Inbound traffic is
negligible. I send any and all 701-bound traffic via you; any
and all other traffic goes through <some other upstreams>. No
complaint there -- and I can do this in your aforementioned
scheme.

Why do you balk at a community that says "I dislike 1239"[1],
thus _preferring_ 701, when I could simply route _all_ non-701
traffic over another one of my upstreams? IMHO, your dislike
of tuning is illogical... I can sway the balance _far_ more
with coarse-grained routing when you don't provide fine-grained
controls.

Because then you introduce C into the mix, another upstream provider of
A. That's cheating. :slight_smile:

I thought the whole discussion was about B having multiple exit points
and A influencing what exit points from B's network A's packets would
take?

Not providing fine-grained tuning accomplishes nothing positive,

Simplicity for its own sake also has value (even aside from benefits
like easier troubleshooting in case of failures, no need to generate
transient outages while fiddling with the tuning knobs, etc.).

Regards,

  -- Niels.