Westnet and Utah outage

I made a private reply to Curtis on his posting earlier this week, and he
gave a nice analysis and cc'd end2end-interest rather than nanog. For
those that don't care to care to read all this, here's the summary:

Which would you prefer? 140 msec and 0% loss or 70 msec and 5% loss?

So we get to choose between large delay or large lossage. Doesn't sound
wonderful...

I thought you folks in nanog might be interested, so with Curtis'
permission, here's the full exchange, (the original posting by Curtis is at
the at the very end).

  -- Jim

Here's what I wrote:

> Curtis,
>
> I think these days for lots of folks the interesting question is not what
> happens when a single or a few high-rate TCPs get in equlibrium, but rather
> what happens when a DS-3 or higher is filled with 56k or slower flows, each
> of which only lasts for an average of 20 packets or so. Unfortunately,
> these 20 packet TCP flows are what's driving the stats these days, due I
> guess to the silly WWW (TCP per file; file per graphic; many graphics per
> page) that's been so successful.

And Curtis's reply:

The analysis below also applies to just under 800 TCP flows each
getting 1/800th of a DS3 link or about 56Kb/s. The loss rate on the
link should be about one packet in 11 if the delay can be increased to
250 msec. If the delay is held at 70 msec, lots of timeouts and
terrible fairness and poor overall performance will result.

Do we need an ISP to prove this to you by exhibiting terrible
performance? If so, please speak to Jon Crowcroft. His case is 400
flows on 4 Mb/s which is far worse, since delay would have to be
increased over 3 seconds or segment size reduced below 552. :frowning:

> I could try to derive the results but I'm sure you or others would do
> better :-). How many of the packets in the 20 packet flow are at
> equilibrium? What's the drop rate? Hmmm, very simple minded analysis says
> that it will be large: expontential growth (doubling cwnd every ack) should
> get above best case pretty quickly, certainly within the 20 packet flow.
> Assume it's only above optimum once, then the packet loss rate is 1 in 20.
> Sounds grim. Vegas TCP sounds better for these reasons, since it tracks
> actual bw, but I'm not really qualified to judge.
>
> -- Jim

Jim,

The end2end-interest thread was quite long and I didn't want to repeat
the whole thing. The initial topic was very tiny TCP flows of 3 to 4
packets. That is a really bad problem, but should no longer be a
realistic problem once HTTP is modified to allow it to pick up both
the HTML page and all inline images in one TCP connection.

Your example is quite reasonable. At 20 packets per flow, with no
loss you get 1, 2, 4, 8, 3 packets per RTT or complete transfer in
about 5 RTT. On average each TCP flow will get 20 packets / 5 RTT of
bandwidth until congestion of 4 packets/RTT (for 552/70 msec, this is
about 64 Kb/s). If the connection is temporarily overloaded by a
factor of 2, this must be reduced to 2 packets/RTT. If we drop 1
packet in 20, roughly 35% of the flows go completely untouched
(0.95^20). Some 15% will drop one packet of the first 3 and timeout
and slow start, resulting in less than 20 packet / 3 seconds (3
seconds >> 5*RTT). Some 60% will drop one packet of the 4th through
20th, resulting in fast retransmit, no timeout, and linear growth in
window. If the 4th is dropped, the window is cut to 2, so next few
RTTs you get 2, 3, 4, 5, 3, or 8 RTTS (2 initial, 1 drop, 5 more).
This is probably not quite enough to slow things down.

On a DS3 with 70 msec RTT and 1500 simultaneous flows of 20 packets
each (steady state such that the number of active flows remains about
1500, roughly twice what a DS3 could support) you would need a drop
rate of on the order of 5% or more. Alternately, you could queue
things up, doubling the delay to 140 msec and give every flow the same
slower rate (perfect fairness in your example) and have a zero drop
rate.

Which would you prefer? 140 msec and 0% loss or 70 msec and 5% loss?
Delay is good. We want delay for elastic traffic! But not for real
time - use RSVP, admission control, police at the ingress and stick it
on the front of the queue.

In practice, I'd expect overload to be due to lots of flows, but not
enough little guys to overload the link (if so, get a bigger pipe, we
can say that and put it in practice). The overload will be due to a
high baseline of little guys (20 packet flows, or a range of fairly
small ones), plus some percentage of longer duration flows capable of
sucking up the better part of a T1, giving half a chance. It is the
latter that you want to slow down, and these are the ones that you
*can* slow down with a fairly low drop rate.

I leave it as an exercise to the reader to determine how RED fits into
this picture (either one, my overload scenario or Jim's where all the
flows are 20 packets in duration).

The 400 flows on 4 Mb/s is an interesting (and difficult) case. I've
suggested both allowing delay to get very large (ie: as high as 2
seconds) and hacking the host implementation to reduce segment size to
as low as 128 bytes when RTT gets huge or cwnd drops below 4 segments,
holding the window to no less than 512 (4 segments) in hopes that fast
retransmit will almost always work even in 15-20% loss situations.

Curtis

Curtis's original posting: