how is cold-potato done?

If I peer with network X in cities A and B, and receive the same route in
both cities with an AS-path of X, how do I know which city to use for an
exit? I can understand how if X uses communities to tag the geographic
origin of the traffic, but I'm not aware of many networks that do
this. Lots of networks claim to use cold-potato routing though, so how do
they do it?

Ralph Doncaster
principal, IStop.com

they use the MED sent on the route (aka metric) from the
other provider to determine which exit where they both interconnect
is the "shortest".

  this can at times provide undesired results because of
aggregation.

  - jared

MED's are one way..
External traceroute kungfu feeding a routeserver are another.

If they are really doing cold-potato routing, they are listening to
the BGP MEDs (metrics) sent by their peer(s) and making the routing
decision based on that. If the MEDs are the same for both routes, the
IGP metric for each BGP next-hop is likely making the decision.

http://www.nanog.org/mtg-9811/ppt/avi/tsld010.htm

Those are the criteria, in order, which BGP uses to make its decision.
I am assuming synchronization, route to next hop, and router-local
decisions (IBGP vs EBGP, weight) are non-issues in this scenario.
Since localpref would be set internally, and AS path is the same (as
I would assume origin code is), that leaves the MED as the first
criterion, followed by shortest next-hop metric (IGP metric, typically).

-c

> If I peer with network X in cities A and B, and receive the same route in
> both cities with an AS-path of X, how do I know which city to use for an
> exit? I can understand how if X uses communities to tag the geographic
> origin of the traffic, but I'm not aware of many networks that do
> this. Lots of networks claim to use cold-potato routing though, so how do
> they do it?

  they use the MED sent on the route (aka metric) from the
other provider to determine which exit where they both interconnect
is the "shortest".

  this can at times provide undesired results because of
aggregation.

Besides aggregation, wouldn't this lead to a lot of ties?
Let's say the cities are LA & Manhattan, and the route from X originates
in Chicago. I would think that it would be a common occurrance for the
route to have the same metric in LA & Manhattan.

-Ralph

Date: Wed, 26 Jun 2002 13:52:08 -0400 (EDT)
From: Ralph Doncaster

If I peer with network X in cities A and B, and receive the same route in
both cities with an AS-path of X, how do I know which city to use for an
exit? I can understand how if X uses communities to tag the geographic
origin of the traffic, but I'm not aware of many networks that do
this. Lots of networks claim to use cold-potato routing though, so how do
they do it?

MEDs

Eddy

In a message written on Wed, Jun 26, 2002 at 01:52:08PM -0400, Ralph Doncaster wrote:

If I peer with network X in cities A and B, and receive the same route in
both cities with an AS-path of X, how do I know which city to use for an
exit? I can understand how if X uses communities to tag the geographic
origin of the traffic, but I'm not aware of many networks that do
this. Lots of networks claim to use cold-potato routing though, so how do
they do it?

Wow, I'm amazed at the wrong answers here. The vendors even document
this, as do the RFC's, see
http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/bgp.htm

More to your question, cold-potato uses MEDS to determine the best exit.
Generally they do not work for large aggregates of the peer, so they
are spread out across the network. Clueful peers set the outgoing meds
on their aggregates to all the same value.

Set to the same value, or clobbered on inbound, if there is no MED,
then the routers inside your network will choose the closest exit
based on your IGP cost. This is "hot potato" routing.

If, by strange chance, you have equal IGP costs to two peering points
with equal MEDS, then it will choose the one with the lower router ID.

As you can see, there are many other steps to the selection process,
as documented in the link above.

Shortest-exit is the default because of the BGP decision process.
This tends to favor heavy-content providers because the bulk of
the data travels shorter distances out of the AS sending content
to the AS receiving the content to their eyeballs.

Shortest-exit is caused by IGP metrics (which shouldn't ever be
the same for two paths, unless you actually want that to happen).
IGP metrics are generally set by length of fiber paths or delay
values. Provider backbones set these manually with ISIS or OSPF
costs.

There are many ways to do best-exit. People are always coming
up with strange ways to do routing (ToS routing, MPLS-TE, DS-TE),
and they can sometimes apply these techniques to best-exit.

For those looking for something simple and standard, the two
ways were made known in the first email -> outbound MED's and
delay-based routing from `traceroute' information. There are
quite a few problems with this as well, documented in many
various papers on the matter e.g.:
http://www.ietf.org/internet-drafts/draft-ietf-idr-route-oscillation-01.txt

For MED's, Avi spoke to the methods used in the following talks:
http://www.nanog.org/mtg-9901/ppt/bgp102/index.htm
http://www.nanog.org/mtg-9811/ppt/avi/index.htm

One thing Avi mentioned here, I never quite understood..
http://www.nanog.org/mtg-9811/ppt/avi/sld031.htm
He says "set MED's in one direction only", but he doesn't say
which direction or why.

As to solving the aggregation problem making outbound MED's
insignficant, there is some work trying to be solved using
Communities (NO-PEER, supercommunities, redistribution, cost
communities, link-bw, et al). Some of which is believed (and
probably rightly so) to be overcomplicated and possibly even
oscillatory just like the other methods.

I enjoy the simple approach that RFC 3272 takes (surprisingly
simple Inter-Domain traffic engineering coming from the super
complex Intra-Domain TE based on MPLS/etc that the authors
recommend). They have some suggestions on setting local_pref
and inbound MED's that I found to be very clueful.
http://www.ietf.org/rfc/rfc3272.txt (Section 7.0)

  "Inter-domain TE is inherently more difficult than intra-domain TE
   under the current Internet architecture. The reasons for this are
   both technical and administrative."

So maybe best practice today for doing best-exit is simply having
the technical data (communities, tags, traffic, etc) and talking
directly with the administrators of your peer-AS to find a solution
(or reading their minds without their data, or inferring it, or
guessing).

I guess the final question is -- why is anyone concerned about
best-exit at all? Doesn't shortest-exit still get the traffic
there? I'm willing to bet there are a lot of different answers
to all these questions.

-dre

I guess the final question is -- why is anyone concerned about
best-exit at all? Doesn't shortest-exit still get the traffic
there? I'm willing to bet there are a lot of different answers
to all these questions.

Some networks will supposedly relax their peering requirements if you do
best-exit. Also, for some networks shortest-exit results in pipes with
large traffic flows in one direction and not the other, so using best-exit
may not require any increase in backbone capacity.

-Ralph

Leo Bicknell <bicknell@ufp.org> [Wed, Jun 26, 2002 at 02:35:55PM -0400]:

In a message written on Wed, Jun 26, 2002 at 01:52:08PM -0400, Ralph Doncaster wrote:

If I peer with network X in cities A and B, and receive the same route in
both cities with an AS-path of X, how do I know which city to use for an
exit? I can understand how if X uses communities to tag the geographic
origin of the traffic, but I'm not aware of many networks that do
this. Lots of networks claim to use cold-potato routing though, so how do
they do it?

Wow, I'm amazed at the wrong answers here. The vendors even document
this, as do the RFC's, see
http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/bgp.htm

More to your question, cold-potato uses MEDS to determine the best exit.
Generally they do not work for large aggregates of the peer, so they
are spread out across the network. Clueful peers set the outgoing meds
on their aggregates to all the same value.

Set to the same value, or clobbered on inbound, if there is no MED,
then the routers inside your network will choose the closest exit
based on your IGP cost. This is "hot potato" routing.

If, by strange chance, you have equal IGP costs to two peering points
with equal MEDS, then it will choose the one with the lower router ID.

<snip>

In the interest of accuracy, it's worth noting that some vendors will
choose the one with the lower router ID, and others will choose
the route that was learned first (at least by default), despite
documentation to the contrary.

mrr

I guess the final question is -- why is anyone concerned about
best-exit at all? Doesn't shortest-exit still get the traffic
there? I'm willing to bet there are a lot of different answers
to all these questions.

-dre

Hmm I have this, equal lengths in terms of geography and hops through my network
but capacity is an issue, traffic sometimes becomes imbalanced and I'd like to
be able to indicate to it which way I want to be receiving.. trouble is seems no
one listens to my MEDs!

Steve

Andre,

What Avi meant is that when you use routing policy (like routemaps or the
equivalent) to set additive MEDs between POPs, only do it on egress from all
POPs or ingress to all POPs. Don't do it on routes both ways. Look at slide
35 - it has all the MEDs being added as "from" routemaps, as opposed to both
"from" and "to".

Here is an example:

I have a POPs in NYC, Chicago, Seattle. I have routes in BGP being announced
from NYC, with a MED of +100 being tacked on as it leaves the NYC POP. I
then add an additional MED of +200 when it leaves the Chicago POP, heading
for Seattle. This is a cost metric, so higher is "worse". If I had routemaps
adding more MED cost upon ingress to the Chicago and Seattle POPs, in
addition to on egress from the NYC and Chicago POPs, you are adding twice as
much to the metrics - it just doesn't make much sense, and is twice the
number of values to control, when you are adjusting the values.

Of course, this is all about generating meaningful MEDs on your own network
for your own purposes, and for those of your customers and peers. It doesn't
really have to do with cold potato routing of other's traffic on your
network (although it does let people cold-potato route your traffic on THEIR
networks.)

Another valid approach for doing this sort of thing is setting your MEDs to
be the same as your IGP metrics to the next hops of the BGP routes - there
are "shortcut" commands for doing this. Of course, your mileage may vary.

- Daniel Golding

More detail on how Cisco does this at:
http://www.cisco.com/warp/public/459/25.shtml

specifically, see step 10:

"10. When both paths are external, prefer the path that was received first
(the oldest one). This step minimizes route-flap, since a newer path won't
displace an older one, even if it was the preferred route based on
additional decision criteria, as described in steps 11, 12, and 13.

Skip this step if any of the following is true:

    * The bgp best path compare-routerid command is enabled.

      Note: This command was introduced in Cisco IOS� Software Releases
  12.0.11S, 12.0.11SC, 12.0.11S3, 12.1.3, 12.1.3AA, 12.1.3.T,
  and 12.1.3.E.
'
    * The router ID is the same for multiple paths, since the routes were
  received from the same router.

    * There is no current best path. An example of losing the current best path
  occurs when the neighbor offering the path goes down."

-Nick