What frame relay switch is causing MCI/Worldcom such grief?

The traffic-engineering reason for L2 "routing" is only valid for
complex-topology networks. In simple topologies, the penalty for
suboptimal paths effectively cancels gains from spreading traffic around.
Physical fiber plants do have rather simple topologies (the rich topologies
are usually "optical illusions" created by SONET layer).

From a customer's point of view performance of the network is _not_

measured as available bandwidth; but rather as performance of his
TCP streams; which depends heavily on latencies and loss. Increasing
latency while there's a lossy component in the path (which is increasingly
found not in backbone but at ingress tail circuits, and outside of the
ISP control) downgrades performance apporximately inversely proportionally
to the latency.

In other words: excluding grossly overloaded circuits, you want the
path with least latency! This is because your performance is limited
by the tail-circuit (or exchange point) loss _and_ the backbone latency.
MPLS does nothing to help avoid these lossy places (avoiding IXP loss
would require propagation of interior routing information into peer
backbones).

Additionally, suboptimal paths as a rule involve more hops, which
increase latency variance proportionally.

Now, no matter how one jumps, most congestions only last seconds.
Expecting any traffic engineering mechanism to take care of these is
unrealistic. A useful time scale for traffic engineering is therefore
at least days - which can be perfectly accomodated by capacity planning in
fixed topology. At these time scales traffic matrices do not change
rapidly. In fact, as long as there are more than three backbones, one
can safely assume that most traffic goes from customers (proportionally
to size of their pipes) to the nearest exchange point; and from exchange
points randomly to all customers (again proportionally to their access
pipe sizes).

Backbones which neglect the capacity planning because they can "reroute"
traffic at L2 level simply cheat their customers. If they _do not_
neglect capacity planning, they do not particularly need the L2 traffic
engineering facilities.

Anyway, the simplest solution (having enough capacity, and physical
topology matching L3 topology) appears to be the sanest way to
build a stable and manageable network. Raw capacity is getting cheap
fast; engineers aren't. And there is no magic recipe for writing
complex _and_ reliable software. The simpler it is, the better it works.

--vadim

Now, no matter how one jumps, most congestions only last seconds.

This isn't the case when you have half your bandwidth to any particular
point down. Excess capacity in other portions of the network may be then
used to carry a portion of the offered load via a suboptimal path.

Expecting any traffic engineering mechanism to take care of these is
unrealistic. A useful time scale for traffic engineering is therefore

Expecting most congestions to last only seconds is also unrealistic. In
most cases, there is no congestion, everything is taking the shortest path
and then there is a loss of capacity and we have a problem. Expecting the
physical circuit to never go down due to sonet protect and diverse routing
is also a bit optimistic as regrooming may eventually reduce your
"diverse" routing to a single path. This does not fly with customers, who
want traffic moved, not excuses that the physically diverse pathing was
regroomed by a telco to be non diverse. Backhoe fade induced loss MTTR is
long enough that TE techniques have proven to be an effective mechanism of
bypassing the outage without operator intervention.

at least days - which can be perfectly accomodated by capacity planning in
fixed topology. At these time scales traffic matrices do not change

This assumes a decent fixed topology. The market has moved faster than
predictions historically.

Backbones which neglect the capacity planning because they can "reroute"
traffic at L2 level simply cheat their customers. If they _do not_
neglect capacity planning, they do not particularly need the L2 traffic
engineering facilities.

Promising local ISP's are not neglecting capacity planning. The problem
is that the effectively _random_ delivery of capacity and at points which
are less than optimal.

Anyway, the simplest solution (having enough capacity, and physical
topology matching L3 topology) appears to be the sanest way to
build a stable and manageable network. Raw capacity is getting cheap
fast; engineers aren't. And there is no magic recipe for writing
complex _and_ reliable software. The simpler it is, the better it works.

There is _no_ disagreement on this topic. This paragraph is correct as it
stands. With the exception of partial capacity loss, this is completely
in line with most people's thinking. No one actually sits down and thinks
"lets effectively route our traffic so that the sum (bit*mile) is the
highest possible." That is just plain wrong. However, given real world
constraints on capacity and delivery, TE is a useful tool today.

/vijay

> Now, no matter how one jumps, most congestions only last seconds.

This isn't the case when you have half your bandwidth to any particular
point down. Excess capacity in other portions of the network may be then
used to carry a portion of the offered load via a suboptimal path.

i believe he is talking of congestion under 'normal' circumstances, in
which the assertion is correct. chronic congestion occurs when you
oversubscribe the link and constantly fill/overflow the queue (and you see
drops as inversely related to interarrival times). chronic congestion
(not related to link failure) should (can? seems to be a question for
some) be accounted for in windows longer than the normal TE hack delta.

you can also engineer an ip network to accomodate circuit failures (with
no APS). You just have to understand the problem. helps to be a
consolidated CLEC/IXC too.

> Expecting any traffic engineering mechanism to take care of these is
> unrealistic. A useful time scale for traffic engineering is therefore

Expecting most congestions to last only seconds is also unrealistic. In
most cases, there is no congestion, everything is taking the shortest path
and then there is a loss of capacity and we have a problem.

in a network where you have oversold core capacity at the edges, it is
certainly normal to experience congestion. remember that the data most
are exposed to are 5 minute averages. examining link utilization/drops at
a shorter delta is most interesting.

Expecting the physical circuit to never go down due to sonet protect
and diverse routing is also a bit optimistic as regrooming may
eventually reduce your "diverse" routing to a single path.

if you are the telco, then you are shooting yourself. if you are not
a telco and you buy diverse path aps in contract and don't inquire about
'regrooming ticket 12341' then you are just as stupid.

> at least days - which can be perfectly accomodated by capacity planning in
> fixed topology. At these time scales traffic matrices do not change

This assumes a decent fixed topology. The market has moved faster than
predictions historically.

so no matter what, it is impossible to give O(f(n)) growth ('O' not theta)?
i find that hard to believe.

highest possible." That is just plain wrong. However, given real world
constraints on capacity and delivery, TE is a useful tool today.

s/tool/hack/g

BR