Outbound Route Optimization

Hello,

I am trying to determine for myself the relevance of Intelligent Routing Devices like Sockeye, Route Science etc. I am not trying to determine who does it better, but rather if the concept of optimizing routes is addressing a significant problem in terms of improved traffic performance ( not in cost savings of disparate transit pipes )

I am interested in hearing other views ( both for and against ) these devices in the context of optimizing latency for a small multi-homed ISP. I want to make sure I understand their context correctly and have not missed any important points of view.

My questions are these:

“Is sub-optimal routing caused by BGP so pervasive it needs to be addressed?”

“Are these devices able to effectively address the need?”

Thanks,

Jim

BGP makes no decisions based on "quality" of a route. If you are using anything that's dependent on low latency/packet loss/jitter (eg, VoIP, games, ssh for someone who gets annoyed by >20ms of latency, etc), there's lots of room for improvement, especially when you are buying from "bargain" transit's.

Everyone I know who's used a device like Sockeye, Route Science, etc, falls into one of two categories.
1) For reasons unrelated to them owning said device, I consider them to be generally lacking clue.
2) They hated it.

I've never used one myself, but based on testimonials like that, I'd tend to say that they generally don't work too well.

If you hire a consultant who knows what they're doing, it should be pretty simple to set up a meaningful routing policy which does this for you.

Just my $0.02

--Phil Rosenthal
ISPrime, Inc.

            My questions are these:

"Is sub-optimal routing caused by BGP so pervasive it needs to be
addressed?"

that depends on your isp, and whether their routing policies (openness
or closedness of peering, shortest vs. longest exit, respect for MEDs)
are a good match for their technology/tools, skills/experience, and
resources/headroom.

"Are these devices able to effectively address the need?"

some of the devices i've seen will address some of the weaknesses in
some of the isp's i've seen. however, and more to what i think is the
point here, none of the devices i've seen will make an isp better since
(a) tools alone can't help, and (b) this isn't the tool that's missing.

and now for the question you didn't ask... "why not?"

controlling which paths you install based on any kind of observational or
predictive metrics is theoretically only going to be as good as those
metrics, which is usually not very good. but there's another limit, which
is bgp path symmetry. most tcp implementations are still stone-aged
compared to what the ietf recommends in terms of congestion avoidance and
output timing, and are therefore pretty dependent on overall isochrony and
on symmetric congestion/latency. let's say that you had ideal metrics for
deciding which path to install -- your overall performance would then be
limited by what other people chose to install as their path toward you.
(experience says they're not going to trust your MEDs even if they're close
enough to hear them.)

> My questions are these:
>
> "Is sub-optimal routing caused by BGP so pervasive it needs to be
> addressed?"

that depends on your isp, and whether their routing policies (openness
or closedness of peering, shortest vs. longest exit, respect for MEDs)
are a good match for their technology/tools, skills/experience, and
resources/headroom.

In practice, all of the above just turn out to be marketing sauce
or in some cases, outright lies.

There is no substitute for dollar spend (opex and capex) to make
a network perform. There is no magic sauce, there is no silver
bullet. You have adequate resources, you will have adequate
performance.

metrics, which is usually not very good. but there's another limit, which
is bgp path symmetry. most tcp implementations are still stone-aged

AKA optimizing for outbound doesn't do you any good on optimizing
for inbound.

(experience says they're not going to trust your MEDs even if they're close
enough to hear them.)

Most people don't trust MEDs for a reason paul, and it is not because
they want to mess with your customers.

/vijay

Sugar pills effectively address the needs of a great many ailments when
given to people who believe that they will work. And if the end result is
an addressed need, who are we to say that it wasn't worth paying for. :slight_smile:

That sounds like a yes answer.
That being said, the Sugar Mountain RouteMaster5000 is probably the best
unit out there. It has lots of blinking lights, and sets the "low latency"
bits on most types of IP traffic that needs high prioritization over
regular internet traffic. It can speed up your network traffic up to
%1000, but the results may vary depending on your packet mix, and in that
case, it doesn't change your traffic patterns at all.

I don't know if they're doing the same thing in Cali or not (they probably
are, since all the radio stations are owned by the same 2 companies), but
here in NoVA land there is currently a massive radio ad campaign for a
Rocky Mountain Radar radar-jamming product called the Phazer, which claims
to jam police radar (legally, because it doesn't actually put out any RF,
it is entirely passive) "or they'll pay for your ticket". When faced with
such a deal, I have heard many people say "how could they possibly back it
up that kind of a promise if it didn't work, they would be losing money
left and right paying for people's tickets". Then I recall the quote from
the inventor, "I could ship an empty black box with a weight in the bottom
and only get 22-24 percent back."

If ever there was a market for voodoo science products, it was IP transit.
It is a big-bucks industry, the consumer can almost never see what is
really going on behind the scenes with their providers (thanks to loads of
NDAs), for the most part they have no real idea what they're doing (but
they like to think they do), and they have been mislead into thinking that
all IP transit is the same -- simply a commodity. A fool and his money,
and all that...

Oh well, at least web hosting is still worse (ever notice that EVERY
hoster has an OC192 backbone, even the ones with 2 machines and a 10Mbps
hub?). :stuck_out_tongue:

I don't know if they're doing the same thing in Cali or not (they probably
are, since all the radio stations are owned by the same 2 companies),

Yeah, NPR and CBS, both monopolistic empires with the same viewpoint :slight_smile:

but here in NoVA land there is currently a massive radio ad campaign for
a Rocky Mountain Radar radar-jamming product called the Phazer, which
claims to jam police radar (legally, because it doesn't actually put out
any RF, it is entirely passive) "or they'll pay for your ticket".

I've heard of this, and I believe I know some people who've invested. They
wanted to diverisfy from the IP Transit biz, and go into 100% pure sales
and marketing. I believe the quote was "man, its hard to sell hosting off
of an OC768 with my 2600 powered network, I'm going to greener pastures"

Then I recall the quote from the inventor, "I could ship an empty black
box with a weight in the bottom and only get 22-24 percent back."

I've heard that before, but it sounds so much better in russian.

Oh well, at least web hosting is still worse (ever notice that EVERY
hoster has an OC192 backbone, even the ones with 2 machines and a 10Mbps
hub?). :stuck_out_tongue:

Yeah, the market for OC192 -> 10Mbps Ethernet is really booming!
Those puppies fly off the shelves like fleeing rats.

Hello,

I am trying to determine for myself the relevance of Intelligent
Routing Devices like Sockeye, Route Science etc. I am not trying to
determine who does it better, but rather if the concept of optimizing
routes is addressing a significant problem in terms of improved
traffic performance ( not in cost savings of disparate transit pipes )

An alternative to using such devices would be to tune the BGP
configuration of the routers. See below for a description of an
algorithm that allows to determine the optimum configuration of
BGP routers for some traffic objectives

[UBQ03] S. Uhlig, O. Bonaventure, and B. Quoitin. Interdomain traffic
engineering with minimal BGP configurations. In 18th International
Teletraffic Congress (ITC), September 2003.
http://totem.info.ucl.ac.be/publications.html

This work is being pursued in the framework of a governement-funded
three-years research project, where we are developping an open-source
traffic engineering toolbox. This objective of this toolbox is to
provide a set of tools that can be used by ISPs and entreprise networks
to optimize their intradomain and interdomain traffic flow. The first
release of the toolbox is planned for november 2004. To help us fit this
toolbox to the needs of ISP and enterprise networks, we would appreciate
if you could fill our survey at :
http://totem.info.ucl.ac.be/te_form.html

Best regards,

Olivier Bonaventure

            My questions are these:

"Is sub-optimal routing caused by BGP so pervasive it needs to be
addressed?"

that depends on your isp, and whether their routing policies (openness
or closedness of peering, shortest vs. longest exit, respect for MEDs)
are a good match for their technology/tools, skills/experience, and
resources/headroom.

In practice, all of the above just turn out to be marketing sauce
or in some cases, outright lies.

There is no substitute for dollar spend (opex and capex) to make
a network perform. There is no magic sauce, there is no silver
bullet. You have adequate resources, you will have adequate
performance.

I dunno if the last sentence is a type-o or not, but it is definitely incorrect in at least some cases. Having "adequate resources" in no way guarantees "adequate performance". (Unless you define "resources" to include the political clout to override business decisions which help the bottom line but hurt performance - e.g. not peering with a network because they are too small.)

OTOH, having inadequate resources does give you a near perfect chance of having inadequate performance.

(experience says they're not going to trust your MEDs even if they're close
enough to hear them.)

Most people don't trust MEDs for a reason paul, and it is not because
they want to mess with your customers.

There are a variety of reasons for not listening to MEDs, including political reasons which may not be in the best interest of performance, or even may be detrimental to performance.

I've found most people willing to put in the time & effort to give you MEDs will give reasonably good MEDs. It also seems the hight of hubris to assume you know what is happening inside someone else's network better than the people who run that network. At least IMHO....

In any case, no matter how many resources or black boxes you have, you cannot guarantee good performance on the 'Net. Too many people involved over which you have no control. Even if you had control, BGP is not the right tool to exert such control in all cases.

Even more reason for people to buy the Sugar Mountain RouteMaster5000.
No matter how good the claims are, you still end up with humans in the mix
dictating "policy" of some sort over packets.

I have been on a personal crusade for the last 8 months to address this
very issue!

We identified the exact same issues and questions as we grew from a
single backbone to 7 backbones, each of various sizes ranging from gig
connections to DS3s. In total I have almost 3GB of total available
capacity, but two small DS3 links make routing decisions very
interesting :slight_smile:

It was becoming a nightmare for my engineers to manage the BGP for all
of these backbones in such a way that dealt with both the business case
as well as the performance case. In the end, it was becoming a customer
service problem when we had spikes that saturated some of our smaller
links and left our larger links untouched. BGP simply did not care about
my capacity issues.

In our specific setting, we are an ISP that buys all of our connectivity,
and has spent a tremendous amount of time searching for total connectivity
as opposed to total capacity. While most of our bandwidth per mb costs
the same, our commit levels with our different carriers are different
and required constant vigilance to maintain the levels we needed to see
without overloading any particular link. We have no private peering at
all.

After some very unfortunate dealings with a bandwidth provider in the "performance
based routing" business, I decided to do it on my own.

Its important to note that in my world, my mandate was simple - get us
the best possible performance from our network as you can possibly get.
Worry about cost after performance. We house some large VoIP, Gaming and
E-Commerce farms and cost was the lowest concern on our plate - keeping
the customers happy was the primary concern.

I started out by going from 2 backbones to buying backbone bandwidth
from a total of 7 carriers, spreading those out among Cisco 7507s and
Juniper M20s and basically relying on BGP and my engineering staff to
monitor and manage those resources.

In the end I discovered that it was a huge job to keep all of those
balls in the air while not upsetting some of our larger customers,

I spent months researching and talking to friends that drive some of the
largest networks in the world. In the end, it was very clear to see that
BGP was not up to the task of dealing with my network requirements. Best
path simply did not equate to best performance and BGP had no
provisions for determining saturation on my links.

My engineers and I spent months talking to vendor after vendor about
their products, doing research and trying to find the closest thing to a
'silver bullet' that we could find.

An engineer friend of mine at Google turned me onto RouteScience and we
put them into the mix of vendors we were testing. Our needs were simple
- 100% performance based routing until we came within 15% max
utilization on any given backbone, then next best performance path. In
my world, cost based routing was the last thing we needed to deal with.

We enlisted the help of several of our larger data center customers in a
kind of blind trial of the various manufacturers as well as utilized
KeyNote locations around the world for testing. After four months of
testing and evaluation, we choose the RouteScience box.

In my mind, the question about utilizing route optimization boxes is moot.
Until we build into BGP (or some other method) the ability to sense
latency and capacity issues, optimize bandwidth allocation based on our
preferences, and maintain service level agreements by keeping our
traffic heading down the best performance path automatically, we have to
employ and dedicate an increasing number of engineers to these tasks.
Route Optimization equipment plays a critical part in keeping my
customers happy and myself and my other expensive engineers focused on
other tasks more closely related to the bottom line.

No smoke, no mirrors, no BS - these are real world numbers from our
network. For me the proof was in the performance. After four months of
baseline reporting, we were seeing an average performance increase
(measure in decrease in latency) of 40 to 50% between the routes my
pathcontrol box is selecting and standard BGP routes. My backbones
include carriers such as Level3, UUNet, Qwest, XO, Verio - decent
backbones with major connectivity.

In reality, I learned that BGP is simply not up to the task of handling
anything beyond its limited scope - best path routing. In today's world,
we need to look beyond best path as it simply has nothing to do with
best performance, at least not in 40 to 50% of my traffic routing
decisions. You can do that with bodies (if your a purest) or you can
utilize route optimization equipment. In either case, you have to do it.

I think for the time being, route optimization equipment, and the
companies that utilize them will have an edge over those doing things
the manual way. Regardless of which box I could have chosen, the end
result is that myself and my backbone engineers have far more time on
their hands for other tasks and my customers are much happier than they
were before.

BGP is relatively good at determining the best path when you a major
carrier with connectivity to "everyone" (i.e. when traffic flows
"naturally"), in many locations, and you engineer your network so that you
have sufficient capacity to support the traffic flows.

However, BGP is relatively BAD at determining the best path when you are
the customer of many carriers, some of whom have serious problems on their
network that they spend a lot of time and effort trying to hide from you,
and when you have a diverse assortment of link speeds. In this setup,
traffic does not flow "naturally".

I often find myself spending a fair amount of time talking people down
from trying to make their network "better" by buying transit from every
carrier they can get their hands on. A single flapping session on a single
transit can get you dampened for quite a while, making you only as strong
as your weakest link. Also, the convergence becomes painfully slow, not to
mention flaptacular, as best paths are computed, announced, re-computed,
re-announced, re-re-computed, etc (and if you don't believe me watch
Internap converge some time). Plus if you are an inbound heavy network,
the localpref increase via certain paths (everyone localprefs their own
customers above routes they hear from peers/transits) will cause a skew in
traffic that prepending may have little to no influence over.

Botton line, BGP is most useful when you select paths as naturally as
possible, with as few transits are as needed for redundancy, and use
equal-sized pipes with sufficient capacity to support the traffic flow (or
where you make capacity decisions based on the traffic levels, not the
other way around). When you try to force BGP to work with the model you
described, it will go kicking and screaming.

Now this isn't to say that even the best run carrier doesn't have their
off days, and that there is potential benefit from having many different
carriers to choose from, but it does almost REQUIRE a different system of
path selection to be effective. Unfortunately there are some serious
problems to overcome in order for any such system to scale, not the least
of which are:

* The inability to receive FULL bgp routes from every bgp peer to your
optimization box without requiring your transit providers to set up a host
of eBGP Multihop sessions (which most refuse to do). This means you will
always be stuck assuming that every egress path is a transit and can reach
any destination on the Internet until your active or passive probing says
otherwise.

* The requirement of deaggregation in order to make best path decisions
effective. For example, someone's T3 to genuithree gets congested and the
best path to their little /24 of the Internet is through another provider.
Do you move 4.0.0.0/8?

* The constant noise of stupid scripts pinging everything on the Internet.
Once upon a time I heard some pretty interesting numbers about the amount
of traffic a newly routed /8 with no usage received just in Internet noise
from all the scanners, hackers, and worms out there. I don't know if it
was true or not (though I'm sure someone on this list has done such and
can tell us exactly how much traffic it is), but just looking at the
amount of noise much smaller blocks receive leads one to the conclusion
that active analysis will not scale to support everyone.

etc etc etc. There is certainly room for improvement of traffic
engineering in the protocols, but the perl scripts and zebra hacks most
people are throwing at the problem currently are far from capable of
handling it.

Richard,

  you have made some good points in this thread.
One general observation, and then specific responses
... I don't assert that current route optimization
technology solves ALL routing problems, but do think
that there are some specific problems that automation
can effectively, and gracefully solve.

* The inability to receive FULL bgp routes from every bgp peer to your
optimization box without requiring your transit providers to set up a host
of eBGP Multihop sessions (which most refuse to do). This means you will
always be stuck assuming that every egress path is a transit and can reach
any destination on the Internet until your active or passive probing says
otherwise.

The issue that you describe does indeed offer some
constraints to the application of route optimization
technology. Within the scope of this issue, though,
I think that you would agree that a network which is
ALL transit would face no challenge here -- and more
specifically, if there is a routing optimization
decision among local transit links, that problem
could be solved independantly of the existance of
"non-transit" links.

Applying this technology in the presence of "non-
transit" routes requires constraining measurments to
only the prefixes appropriate for a given link. It
is true that knowing all BGP routes ("BGP Losers")
would be a nice way to get this information ...
but it's not necessarily the only approach towards
the goal. Some solutions may have topological
dependancies, but it can be feasible to simply drop
all measurement towards "illegal" destinations.

In other cases, it may be possible to define the
set of destinations that are legal over a given
link, and constrain measurements for that link.

* The requirement of deaggregation in order to make best path decisions
effective. For example, someone's T3 to genuithree gets congested and the
best path to their little /24 of the Internet is through another provider.
Do you move 4.0.0.0/8?

Perhaps. Yes, it's a /8. But if measurements to the /8 show
better collective performance over another link, why NOT
move it? Yes, it could be carrying a lot of traffic, and
could result in congesting the next link ... so it is
necessary to be able to:

  - know when links are at/near capacity,
    and so avoid their use; and

  - react quickly in case of congestion

Note that these problems are not specific to /8s,
and that traffic loads are dynamic - even if it
does look like there is "room" for a prefix on a
link, once the route gets changed, conditions
could very well change also. Any route optimization
system needs to deal with these issues for ALL
prefixes.

There are multiple levels of optimization possible
on top of this:

  a) If there is a general belief that /8s are
     simply "too big" to move, they can be manually
     deaggregated. Our experience shows that by
     breaking up a /8 into as few as (10) or (15)
     carefully designed "chunks", the resultant
     load per (deaggregated) prefix becomes equivalent
     to hundreds of other prefixes.

  b) If manually configuring deaggregates is not
     desirable, automated approaches to deaggregation
     are possible: "If I see traffic in this range,
     and a /xx does not exist for the observed traffic,
     then create the /xx".
  
  c) Dynamically measure all of the possible
     deaggregations of all active space, and dynamically
     determine which prefixes need to be deaggregated
     to what level.

Note that in any of the above cases, the de-aggregated
routes should be marked NO_EXPORT.

I know of solid commercial implementations of (a) and
(b). (c) is a more interesting project ... :slight_smile:
    

* The constant noise of stupid scripts pinging everything
  on the Internet.

Pinging the Internet is clearly a wasteful approach. Essentially
no one needs optimization to the ENTIRE Internet. Granted, major
backbones probably actually use a great deal of the routing
table ...

  (Quiz for the list readers:
   What percentage of the Internet routing table does
   your network actually use?)

... but for many ISP/hosting facility/major multihomed
enterprise, our experience shows that only a very small
fraction of traffic is seen beyond about (20,000-30,000)
routes in a given day.

There is no reason to measure destinations unless they
are involved with traffic to your network. Basing
measurements on observed traffic, or having applications
instrumented to automatically generate their own measurement
are both "clean" options here.

Companies and ISPs today spend time(=money) managing their
connectivity to the Internet. Loop-free connectivity is a
basic first step; but in many cases real connectivity goals
include:
  
   - Capacity management (especially in the presence
     of asymmetrical bandwidth)
   - Load management (in the case of usage-based billig)
   - Performance management (realizing 'best possible'
     performance)
   - Maximizing application availability (fastest possible
     reroute, in the case of congestive failure)

Manually tweaking routing policies to achieve these goals
is a time-honored craft (especially with this crowd :slight_smile: ...
but I suspect that even the most experienced in this area
will acknowledge that there is a tier of this problem that
may be best automated. (Note that I said "a tier" -- there
are clearly additional problems that current route optimization
technology DOESN'solve. :slight_smile:

cheers -- Sean

The issue that you describe does indeed offer some constraints to the
application of route optimization technology. Within the scope of this
issue, though, I think that you would agree that a network which is ALL
transit would face no challenge here -- and more specifically, if there
is a routing optimization decision among local transit links, that
problem could be solved independantly of the existance of "non-transit"
links.

Just noting why it will never be anything other than a small customer
transit-only solution. As long as you are guaranteed by design that your
product will never be applicable to large networks or networks with any
peering, you know that odds are VERY slim you'll ever have anyone with
real network clue using the product. Under such conditions, snake oil
sales flurish.

Applying this technology in the presence of "non- transit" routes
requires constraining measurments to only the prefixes appropriate for a
given link. It is true that knowing all BGP routes ("BGP Losers") would
be a nice way to get this information ... but it's not necessarily the
only approach towards the goal. Some solutions may have topological
dependancies, but it can be feasible to simply drop all measurement
towards "illegal" destinations.

In other cases, it may be possible to define the set of destinations
that are legal over a given link, and constrain measurements for that
link.

Good luck making this scale. :slight_smile:

> * The requirement of deaggregation in order to make best path decisions
> effective. For example, someone's T3 to genuithree gets congested and the
> best path to their little /24 of the Internet is through another provider.
> Do you move 4.0.0.0/8?

Perhaps. Yes, it's a /8. But if measurements to the /8 show
better collective performance over another link, why NOT
move it? Yes, it could be carrying a lot of traffic, and
could result in congesting the next link ... so it is
necessary to be able to:

  - know when links are at/near capacity,
    and so avoid their use; and

  - react quickly in case of congestion

What is broken for one provider and fixed at another may very well break
something else that was working before at the first provider, yes? Besides
the difficulties of assigning a true metric to the overall reachability of
a /8 or any aggregate for that matter ("ok we decreased rtt by 20ms to
these 3 destinations doing 15Mbps each but we increased rtt to this other
destination doing 40Mbps by 60ms so we're better right?"), do you really
want to see the problems you are supposed to be solving with optimized
routing popping up and going away again throughout the day?

And yes you do bring up another valid point, how much of the congestion
you're trying to avoid is caused by your own traffic? If the answer is
none you're fine, but this by definition means the failure of your
optimized routing product. If it is a success you will either a) have
people with lots of traffic using it, or b) have so many small-traffic
users that the collective decisions of your box become the "huge user".

The problems then become:

* The quicker you try to react, the more you place yourself at risk of
   starting a best path flap cycle.

* Congestion does not only happen on your uplink circuit, it can happen
   at every point along the path, including peers, backbone circuits, and
   even the end user/site links. While I find the sales pitches of people
   touting the horrors of peering to be quite sad (from Internap to the
   classic MAE Dulles :P), peering capacity is largely based on the
   ability to predict the traffic levels far in advance. It doesn't take
   that many "large" customers selecting certain destinations through one
   provider at once to blow up a peer in one region.

Balancing the traffic of a GigE and a couple of FastE transits to keep
each one uncongested may be enough functionality to sell some boxes to
some low end users, but this falls into the categories I've described
above, and does nothing to address the true end to end performance.

Thus the only real solution to the problem if you actually want to
optimize traffic is:

  c) Dynamically measure all of the possible
     deaggregations of all active space, and dynamically
     determine which prefixes need to be deaggregated
     to what level.

Note that in any of the above cases, the de-aggregated
routes should be marked NO_EXPORT.

Throw away the BGP routing table completely, and build your own based on
the topology and metrics you have detected. Of course, this means saying
goodbye to the usual failsafe method of keeping the normal BGP routes in
the table with a lower localpref so if the box falls over you just fail
back to normal BGP path selection. And probably more importantly, there
isn't enough scale in the traffic probing system to gather the necessary
topology info once for every customer... Maybe if you made everyone's
boxes report data back to a central site, you could gather something
useful from it.

Pinging the Internet is clearly a wasteful approach. Essentially
no one needs optimization to the ENTIRE Internet. Granted, major
backbones probably actually use a great deal of the routing
table ...

  (Quiz for the list readers:
   What percentage of the Internet routing table does
   your network actually use?)

... but for many ISP/hosting facility/major multihomed
enterprise, our experience shows that only a very small
fraction of traffic is seen beyond about (20,000-30,000)
routes in a given day.

There is no reason to measure destinations unless they
are involved with traffic to your network. Basing
measurements on observed traffic, or having applications
instrumented to automatically generate their own measurement
are both "clean" options here.

The usage numbers sound about right, and targetting only destinations
where you actually exchange traffic is certainly a big improvement over
not, but it's still going to generate a lot of noise for active traffic
destinations.

But I guess there are always passive measurement alternatives, like
measuring the of a gif customers have to link on their websites *cough*.
:slight_smile:

Manually tweaking routing policies to achieve these goals is a
time-honored craft (especially with this crowd :slight_smile: ... but I suspect that
even the most experienced in this area will acknowledge that there is a
tier of this problem that may be best automated. (Note that I said "a
tier" -- there are clearly additional problems that current route
optimization technology DOESN'solve. :slight_smile:

I doubt you'll find anyone here who will stand up and admit to enjoying
tweaking metrics and policies more often than once a month. The problem
with interest from most of this crowd (or at least "those of this crowd
who actually run networks", which probably doesn't qualify as most any
more) is simply that none of the product and very little of the technology
applies to the networks they run or the work they have to do.

Richard A Steenbergen wrote:

> The issue that you describe does indeed offer some constraints to the
> application of route optimization technology. Within the scope of this
> issue, though, I think that you would agree that a network which is ALL
> transit would face no challenge here -- and more specifically, if there
> is a routing optimization decision among local transit links, that
> problem could be solved independantly of the existance of "non-transit"
> links.

Just noting why it will never be anything other than a small customer
transit-only solution. As long as you are guaranteed by design that your
product will never be applicable to large networks or networks with any
peering, you know that odds are VERY slim you'll ever have anyone with
real network clue using the product. Under such conditions, snake oil
sales flurish.

It appears to me that you've acknowledged that route
optimization solves a problem, albeit one that is not
a complete solution for your network. The claims of
'snake oil' seems inappropriate in this context.

One step further: if you are running a network of this
type, then there seems to be a large likelihood that
you are selling transit. Thus, your customers may well
be using technology of this sort to provide real solutions
to THEIR problems. (specifically, they may be directing
traffic towards providers that are to _their_ advantage;
and be gaining detailed insight as to the real quality
of connectivity being provided to them.)

It's not clear to me how you chose to define "real network
clue", but I would not suggest that your customers are
completely lacking in that area. :slight_smile:

> In other cases, it may be possible to define the set of destinations
> that are legal over a given link, and constrain measurements for that
> link.

Good luck making this scale. :slight_smile:

Granted - it is a limited solution -- but still a
solution that does solve a set of real-world problems.

What is broken for one provider and fixed at another may very well break
something else that was working before at the first provider, yes? Besides
the difficulties of assigning a true metric to the overall reachability of
a /8 or any aggregate for that matter ("ok we decreased rtt by 20ms to
these 3 destinations doing 15Mbps each but we increased rtt to this other
destination doing 40Mbps by 60ms so we're better right?"),

Having measurement traffic that directly correlates to
actual traffic makes this problem much more managable.

The problems then become:

* The quicker you try to react, the more you place yourself at risk of
   starting a best path flap cycle.

* Congestion does not only happen on your uplink circuit, it can happen
   at every point along the path, including peers, backbone circuits, and
   even the end user/site links. While I find the sales pitches of people
   touting the horrors of peering to be quite sad (from Internap to the
   classic MAE Dulles :P), peering capacity is largely based on the
   ability to predict the traffic levels far in advance. It doesn't take
   that many "large" customers selecting certain destinations through one
   provider at once to blow up a peer in one region.

Flap control is an important consideration.

Note that in the described topology, changing the selection
of an egress point does not affect the routing tables of
external networks (as opposed to flapping of route advertisements,
for inbound traffic.)

I do think that it's useful to compare the behaviour of
"mortal" BGP in the conditions you describe ... if BGP
selects a path that is, or becomes, congested ... BGP
has no feedback mechanism to make a change until the
overall topology changes, or until manual intervention.

An automated route optimization system can evaluate
the performance, and current load, of alternate egresses,
make an automated change to the egress, and then monitor
the success of the change. In most cases, the overall
conditions will have been improved. In the case you
describe above, the route change results in suboptimal
performance, and a new decision is needed. This process
needs to have effective flap control. This is an area
in which I've seen a fair amount of development; and
have seen good results in years of production use.

Balancing the traffic of a GigE and a couple of FastE transits to keep
each one uncongested may be enough functionality to sell some boxes to
some low end users, but this falls into the categories I've described
above, and does nothing to address the true end to end performance.

It's not clear to me what you mean here by "true end to end
performance". I don't pretend that the approach being discussed
is a COMPREHENSIVE solution to all the problems that can impair
performance; but I do think that for the class of performance
problems that are directly observable via inspection of alternate
egresses, redirecting the egress does in fact address "true end to
end performance".

Thus the only real solution to the problem if you actually want to
optimize traffic is:

> c) Dynamically measure all of the possible
> deaggregations of all active space, and dynamically
> determine which prefixes need to be deaggregated
> to what level.
>
> Note that in any of the above cases, the de-aggregated
> routes should be marked NO_EXPORT.

Throw away the BGP routing table completely, and build your own based on
the topology and metrics you have detected. Of course, this means saying
goodbye to the usual failsafe method of keeping the normal BGP routes in
the table with a lower localpref so if the box falls over you just fail
back to normal BGP path selection.

This alone seems to make adoption of such technology
rather difficult ...

And probably more importantly, there
isn't enough scale in the traffic probing system to gather the necessary
topology info once for every customer...
... Maybe if you made everyone's
boxes report data back to a central site, you could gather something
useful from it.

IMHO, that approach has demonstrated scalability limitations.

Performance, and load information, tends to get stale very
quickly.

Date: Mon, 26 Jan 2004 15:35:28 -0500
From: Richard A Steenbergen

> (Quiz for the list readers:
> What percentage of the Internet routing table does
> your network actually use?)

Perhaps around 25% for a "moderate"-sized organizition, but
as low as 5% is not unreasonable for regionals and locals.
Discount spam from around the world, and I suspect the numbers
drop even more. :slight_smile:

I doubt you'll find anyone here who will stand up and admit
to enjoying tweaking metrics and policies more often than
once a month. The proble with interest from most of this

A couple days a couple time a year of manual testing and tweaking
for "most important" prefixes usually does the trick.

Considering industry instability, one probably changes an
upstream about as often as one would tune anyway. Considering
the difficulty one often has finding a clued rep, one probably
spends more time educating sales reps than tuning traffic. :wink:

Eddy