TCP congestion control and large router buffers

Vasil_Kolev · December 9, 2010, 3:09pm

https://gettys.wordpress.com/2010/12/06/whose-house-is-of-glasse-must-not-throw-stones-at-another/

I wonder why this hasn't made the rounds here. From what I see, a change
in this part (e.g. lower buffers in customer routers, or a change (yet
another) to the congestion control algorithms) would do miracles for
end-user perceived performance and should help in some way with the net
neutrality dispute.

I also understand that a lot of the people here operate routers which
are a bit far from the end-users and don't have a lot to do with this
issue, but the rest should have something to do with
choosing/configuring these end-user devices, so this should be relevant.

Mikael_Abrahamsson · December 9, 2010, 3:20pm

I'd say this is common knowledge and has been for a long time.

In the world of CPEs, lowest price and simplicity is what counts, so nobody cares about buffer depth and AQM, that's why you get ADSL CPEs with 200+ ms of upstream FIFO buffer (no AQM) in most devices.

Personally I have MQC configured on my interface which has assured bw for small packets and ssh packets, and I also run fair-queue to make tcp sessions get a fair share. I don't know any non-cisco devices that does this.

Mikael_Abrahamsson · December 14, 2010, 6:40am

Ironically though, wouldn't smaller buffers cost less thus making the CPEs

1 megabyte of buffer (regular RAM) isn't really expensive.

cheaper still? I believe the argument made in the blog post is that cheaper RAM been causing the CPE manufacturers to mistakenly provision too much buffer space, which in turn apparently means that TCP can't stabilize at a rate less than available bandwidth (I.e. It's the old 1980's congestion collapse problem all over again). Of course, you'll only see this if a single TCP stream is actually capable of saturating the link. Sam

I would guess they're running standard OSes and haven't tuned the buffers at all. Implementing WRED or fair-queue (even if it just means turning it on) requires validation which the CPE manufacturers want to minimize.

Also it's our fault as a business, how many ISPs have included AQM in their RFPs for CPEs and actually would pay USD5 more per device for this feature? I'm not very surprised at the lack of this though, it's hard to explain to the end customer with some kind of marketing, both for the ISP and the CPE vendor if they're selling to end customers.

It's one of those "in the black box" things that should just work, but there is little upside in having it because it's hard to charge for.

Mikael_Abrahamsson · December 14, 2010, 1:00pm

But there's no need for AQM, just smaller buffers would make a huge
difference.

Well, yes, buffering packets more than let's say 30-50ms on a 1 meg link doesn't make much sense. But doing some basic AQM would make things even better (some packets would see 0 buffering instead of 30ms).

Surely buffers that can store seconds worth of data are simply too big?

FIFO with seconds worth of data is just silly, yes.

George_Bonser · December 14, 2010, 5:28pm

Well, Jim Getty was reporting seeing "tens of seconds" of buffering
(comments in the original LWN link to his first posting) which is just
ludicrous. No TCP stack is going to respond properly to congestion with
that sort of delay. Some form of AQM is probably a good thing as would
be the wider use of ECN. Finding out that a buffer filled and a packet
(or many packets) was dropped five seconds after the fact, isn't going
to help anyone and you just end up whipsawing the window size (Lawrence
Welk effect http://www.oeta.onenet.net/welk/PM/images/Lawrence.jpg ?).
I would favor seeing more use of ECN so that a sender can be notified to
back off when a buffer is approaching capacity but there is apparently
still a lot of hardware out there that has problems with it.

You need enough buffering to satisfy packets "in flight" for a
connection on the other side of the planet but man, what he has been
reporting is just insane and it would be no wonder performance can be
crap.

Mikael_Abrahamsson · December 14, 2010, 5:50pm

that sort of delay. Some form of AQM is probably a good thing as would
be the wider use of ECN. Finding out that a buffer filled and a packet
(or many packets) was dropped five seconds after the fact, isn't going

ECN pretty much needs WRED, and then people need to implement that first. The only routing platform I know to support it is 7200 and the other types of cpu routers from Cisco running fairly recent IOS (seems to have been introduced in 12.2T).

http://www.cisco.com/en/US/docs/ios/12_2t/12_2t8/feature/guide/ftwrdecn.html

You need enough buffering to satisfy packets "in flight" for a
connection on the other side of the planet but man, what he has been
reporting is just insane and it would be no wonder performance can be
crap.

Yeah, 30-60ms of buffering is what I have favoured so far.

With L2 switches you don't get anywhere near that, but on the other side a few ms of buffering+tail drop has much less impact on interactive applications compared to seconds of buffering.

Joel_Jaeggli · December 19, 2010, 7:16pm

I wonder why this hasn't made the rounds here. From what I see, a
change in this part (e.g. lower buffers in customer routers, or a
change (yet another) to the congestion control algorithms) would do
miracles for end-user perceived performance and should help in some
way with the net neutrality dispute.

I'd say this is common knowledge and has been for a long time.

In the world of CPEs, lowest price and simplicity is what counts, so
nobody cares about buffer depth and AQM, that's why you get ADSL CPEs
with 200+ ms of upstream FIFO buffer (no AQM) in most devices.

you're going to see more of it, at a minimum cpe are going to have to be
able to drain a gig-e into a port that may be only 100Mb/s. The QOS
options available in a ~$100 cpe router are adequate for the basic purpose.

d-link dir-825 or 665 are examples of such devices

Personally I have MQC configured on my interface which has assured bw
for small packets and ssh packets, and I also run fair-queue to make tcp
sessions get a fair share. I don't know any non-cisco devices that does
this.

the consumer cpe that care seem to be mostly oriented along keeping
gaming and voip from being interfereed with by p2p and file transfers.

Jim_Gettys · December 20, 2010, 11:20pm

I wonder why this hasn't made the rounds here. From what I see, a
change in this part (e.g. lower buffers in customer routers, or a
change (yet another) to the congestion control algorithms) would do
miracles for end-user perceived performance and should help in some
way with the net neutrality dispute.

It's really hard to replace all the home user's hardware. Trying to "fix" the problem by fixing all of that is much more painful (and expensive) than fixing the network to not have the buffers.

I'd say this is common knowledge and has been for a long time.

Common knowledge among whom? I'm hardly a naive Internet user.

And the statement is wrong: the large router buffers have effectively destroyed TCP's congestion avoidance altogether.

In the world of CPEs, lowest price and simplicity is what counts, so
nobody cares about buffer depth and AQM, that's why you get ADSL CPEs
with 200+ ms of upstream FIFO buffer (no AQM) in most devices.

200ms is good; but it is often up to multiple *seconds*. Resulting latencies on broadband gears are often horrific: see the netalyzr plots that I posted in my blog. See:

Dave Clark first discovered bufferbloat on his DSLAM: he used the 6 second latency he saw to DDOS his son's excessive WOW playing.

All broadband technologies are affected, as are, it turns out, all operating systems and likely all home routers as well (see other posts I've made recently). DSL, cable and FIOS all have problems.

How many of retail ISP's service calls have been due to this terrible performance?

I know I was harassing Comcast with multiple service calls over a year ago over what I now think was bufferbloat. And periodically for a number of years before that (roughly since DOCSIS 2 deployed, I would guess).

"The Internet is slow today, Daddy" was usually Daddy saturating the home link, and bufferbloat the cause. Every time they would complain, I'd stop what I was doing, and the problem would vanish. A really nice willow the wisp...

you're going to see more of it, at a minimum cpe are going to have to be
able to drain a gig-e into a port that may be only 100Mb/s. The QOS
options available in a ~$100 cpe router are adequate for the basic purpose.

But the port may only be 1 Mb/second; 802.11g is 20Mbps tops; but drops to 1Mbps in extremis.

So the real dynamic range is at least a factor of 1000 to 1.

d-link dir-825 or 665 are examples of such devices

Yes, and E3000's and others. Some are half measures, and have a single knob for both shaping uplinks and downlink bandwidth.

The QOS features in home routers can help, but does not solve all problems.

In part, because as broadband bandwidth increases, the bottleneck link may shift/often shifts to the home router to edge device links, and there are similar (or even worse) bufferbloat problems in both the home routers and operating systems.

Personally I have MQC configured on my interface which has assured bw
for small packets and ssh packets, and I also run fair-queue to make tcp
sessions get a fair share. I don't know any non-cisco devices that does
this.

the consumer cpe that care seem to be mostly oriented along keeping
gaming and voip from being interfereed with by p2p and file transfers.

An unappreciated issue is that these buffers have destroyed TCP (and all other congestion avoiding protocols) congestion avoidance.

Secondly, any modern operating system (anything other than Windows XP), implements window scaling, and will, within about 10 seconds, *fill* the buffers with a single TCP connection, and they stay full until traffic drops enough to allow them to empty (which may take seconds). Since congestion avoidance has been defeated, you get nasty behaviour out of TCP.

Congestion avoidance depends on *timely* notification to the end points of congestion: these buffers have destroyed the *timely* requirement of a fundamental presumption of internet protocol design.

If you think that simultaneously:
  1) destroying congestion avoidance
  2) destroying slow-start, as many major web sites are
  by increasing their initial window
  3) browsers are now using many TCP connections simultaneously
  4) while the TCP traffic shifts to window scaling, enabling
     even a single TCP connection to fill these buffers.
  5) increasing numbers of large uploads/downloads (not just
     bittorrent, HD movie delivery to disk, backup, crash dump
     uploads, etc)
is a good idea, you aren't old enough to have experienced the NSFnet collapse during the 1980's (as I did). I have post-traumatic stress disorder from that experience; I'm worried about the confluence of these changes, folks.

And there are network neutrality aspects to bufferbloat: since carriers have been provisioning their telephony service separate from their internet service, *and* there are these bloated buffers, *and* there is no classification that end users can perform over their broadband connections, you can't do as well as a carrier even with fancy home routers for any low latency service such as voip. See: Bufferbloat and network neutrality – back to the past… | jg's Ramblings Personally, I don't think this was by malice of forethought, but it's not a good situation.

The best you can do is what Ooma has done; bandwidth shaping along with being closest to the broadband connection (or by fancy home routers with classification and bandwidth shaping). That won't help the downstream direction where a single other user (or yourself), can inject large packet bursts routinely by browsing web sites like YouTube or Google images (unless some miracle occurs, and the broadband head ends are classifying traffic in the downstream direction over those links).
- Jim

Mikael_Abrahamsson · December 21, 2010, 7:18am

Common knowledge among whom? I'm hardly a naive Internet user.

Anyone actually looking into the matter. The Cisco "fair-queue" command was introduced in IOS 11.0 according to <http://www.cisco.com/en/US/docs/ios/12_2/qos/command/reference/qrfcmd1.html#wp1098249> to somewhat handle the problem. I have no idea when this was in time, but I guess early 90:ties?

And the statement is wrong: the large router buffers have effectively destroyed TCP's congestion avoidance altogether.

Routers have had large buffers since way before residential broadband even came around, the basic premise of TCP is that routers have buffers and quite a lot of it.

200ms is good; but it is often up to multiple *seconds*. Resulting latencies on broadband gears are often horrific: see the netalyzr plots that I posted in my blog. See:

I know of the problem, it's no news to me. You don't have to convince me. I've been using Cisco routers as a CPE because of this for a long time.

Dave Clark first discovered bufferbloat on his DSLAM: he used the 6 second latency he saw to DDOS his son's excessive WOW playing.

When I procured a DSLAM around 2003 or so it had 40ms of buffering at 24meg ADSL2+ speed, when the speed went down, the buffers length in bytes was constant so buffering time also went up. It didn't do any AQM either, but at least it did .1p prioritization and had 4 buffers so there was a little possibility of doing things upstream of it.

All broadband technologies are affected, as are, it turns out, all operating systems and likely all home routers as well (see other posts I've made recently). DSL, cable and FIOS all have problems.

Yes.

How many of retail ISP's service calls have been due to this terrible performance?

A lot, I'm sure.

Secondly, any modern operating system (anything other than Windows XP), implements window scaling, and will, within about 10 seconds, *fill* the buffers with a single TCP connection, and they stay full until traffic drops enough to allow them to empty (which may take seconds). Since congestion avoidance has been defeated, you get nasty behaviour out of TCP.

That is exactly what TCP was designed to do, use as much bandwidth as it can. Congestion is detected by two means, latency goes up and/or there is packet loss. TCP was designed with router buffers in mind.

Anyhow, one thing that might help would be ECN in conjunction with WRED, but already there you're way over most CPE manufacturers head.

is a good idea, you aren't old enough to have experienced the NSFnet collapse during the 1980's (as I did). I have post-traumatic stress disorder from that experience; I'm worried about the confluence of these changes, folks.

I'm happy you were there, I was under the impression that routers had large buffers back then as well?

The best you can do is what Ooma has done; bandwidth shaping along with being closest to the broadband connection (or by fancy home routers with classification and bandwidth shaping). That won't help the downstream direction where a single other user (or yourself), can inject large packet bursts routinely by browsing web sites like YouTube or Google images (unless some miracle occurs, and the broadband head ends are classifying traffic in the downstream direction over those links).

There is definitely a lot of improvement to be had. For FTTH, if you use an L2 switch with a few ms of buffering as the ISP handoff device, you don't get this problem. There are even TCP algorithms to handle this case where you have little buffers and just tail-drop

But yes, I agree that we'd all be much helped if manufacturers of both ends of all links had the common decency of introducing a WRED (with ECN marking) AQM that had 0% drop probability at 40ms and 100% drop probability at 200ms (and linear increase between).

Sam_Stickland1 · December 21, 2010, 7:21pm

Common knowledge among whom? I'm hardly a naive Internet user.

Anyone actually looking into the matter. The Cisco "fair-queue" command was
introduced in IOS 11.0 according to <
http://www.cisco.com/en/US/docs/ios/12_2/qos/command/reference/qrfcmd1.html#wp1098249>
to somewhat handle the problem. I have no idea when this was in time, but I
guess early 90:ties?

200ms is good; but it is often up to multiple *seconds*. Resulting latencies
on broadband gears are often horrific: see the netalyzr plots that I posted
in my blog. See:

I know of the problem, it's no news to me. You don't have to convince me.
I've been using Cisco routers as a CPE because of this for a long time.

Interestingly I've just tried to enable WRED on a Cisco 877 (advsecurity
15.1) and the random-detect commands are missing. Cisco's feature navigator
says it's supported though. Weird.

Also, there doesn't appear to be a way to enable fair-queue on the wireless
interface. Is fair-queue seen as a bad strategy for wireless and it's
varying throughput/goodput rates?

And finally it doesn't support inbound shaping so I can't experience with
trying to build the queues on it rather than the DSLAM.

I'm a little nonplussed to be honest.

However, I did notice the output queue on the dialler interface defaults to
1000 packets. (Perhaps that's a hangover from when it had to queue packets
whilst dialling? I've come too late to networking to know). Reducing that
number to 10 (~60ms @ 1500 bytes @ 8Mbps) has noticeably increased the
latency response and fairness of the connection under load.

Sam

Baker_Fred · December 21, 2010, 9:24pm

Common knowledge among whom? I'm hardly a naive Internet user.

Anyone actually looking into the matter. The Cisco "fair-queue" command was introduced in IOS 11.0 according to <http://www.cisco.com/en/US/docs/ios/12_2/qos/command/reference/qrfcmd1.html#wp1098249> to somewhat handle the problem. I have no idea when this was in time, but I guess early 90:ties?

1995. I know the guy that wrote the code. Meet me in a bar and we can share war stories. The technology actually helps with problems like RFC 6057 addresses pretty effectively.

is a good idea, you aren't old enough to have experienced the NSFnet collapse during the 1980's (as I did). I have post-traumatic stress disorder from that experience; I'm worried about the confluence of these changes, folks.

I'm happy you were there, I was under the impression that routers had large buffers back then as well?

Not really. Yup, several of us were there. The common routers on the NSFNET and related networks were fuzzballs, which had 8 (count them, 8) 576 byte buffers, Cisco AGS/AGS+, and Proteon routers. The Cisco routers of the day generally had 40 buffers on each interface by default, and might have had configuration changes; I can't comment on the Proteon routers. For a 56 KBPS line, given 1504 bytes per message (1500 bytes IP+data, and four bytes of HDLC overhead), that's theoretically 8.5 seconds. But given that messages were in fact usually 576 bytes of IP data (cf "fuzzballs" and unix behavior for off-LAN communications) and interspersed with TCP control messages (Acks, SYNs, FINs, RST), real queue depths were more like two seconds at a bottleneck router. The question would be the impact of a sequence of routers all acting as bottlenecks.

IMHO, AQM (RED or whatever) is your friend. The question is what to set min-threshold to. Kathy Nichols (Van's wife) did a lot of simulations. I don't know that the paper was ever published, but as I recall she wound up recommending something like this:

line rate ms queue depth
  (MBPS) RED min-threshold
     2 32
    10 16
   155 8
   622 4
2,500 2
10,000 1

But yes, I agree that we'd all be much helped if manufacturers of both ends of all links had the common decency of introducing a WRED (with ECN marking) AQM that had 0% drop probability at 40ms and 100% drop probability at 200ms (and linear increase between).

so, min-threshold=40 ms and max-threshold=200 ms. That's good on low speed links; it will actually control queue depths to an average of O(min-threshold) at whatever value you set it to. The problem with 40 ms is that it interacts poorly with some applications, notably voice and video.

It also doesn't match well to published studies like http://www.pittsburgh.intel-research.net/~kpapagia/papers/p2pdelay-analysis.pdf. In that study, a min-threshold of 40 ms would have cut in only on six a-few-second events in the course of a five hour sample. If 40 ms is on the order of magnitude of a typical RTT, it suggests that you could still have multiple retransmissions from the same session in the same queue.

A good photo of buffer bloat is at
ftp://ftpeng.cisco.com/fred/RTT/Pages/4.html
ftp://ftpeng.cisco.com/fred/RTT/Pages/5.html

The first is a trace I took overnight in a hotel I stayed in. Never mind the name of the hotel, it's not important. The second is the delay distribution, which is highly unusual - you expect to see delay distributions more like

ftp://ftpeng.cisco.com/fred/RTT/Pages/8.html

(which actually shows two distributions - the blue one is fairly normal, and the green one is a link that spends much of the day chock-a-block).

My conjecture re 5.html is that the link *never* drops, and at times has as many as nine retransmissions of the same packet in it. The spikes in the graph are about a TCP RTO timeout apart. That's a truly worst case. For N-1 of the N retransmissions, it's a waste of storage space and a waste of bandwidth.

AQM is your friend. Your buffer should be able to temporarily buffer as much as an RTT of traffic, which is to say that it should be large enough to ensure that if you get a big burst followed by a silent period you should be able to use the entire capacity of the link to ride it out. Your min-threshold should be at a value that makes your median queue depth relatively shallow. The numbers above are a reasonable guide, but as in all things, YMMV.

Jim_Gettys · December 22, 2010, 4:48pm

Common knowledge among whom? I'm hardly a naive Internet user.

Anyone actually looking into the matter. The Cisco "fair-queue" command was introduced in IOS 11.0 according to<http://www.cisco.com/en/US/docs/ios/12_2/qos/command/reference/qrfcmd1.html#wp1098249> to somewhat handle the problem. I have no idea when this was in time, but I guess early 90:ties?

1995. I know the guy that wrote the code. Meet me in a bar and we can share war stories. The technology actually helps with problems like RFC 6057 addresses pretty effectively.

is a good idea, you aren't old enough to have experienced the NSFnet collapse during the 1980's (as I did). I have post-traumatic stress disorder from that experience; I'm worried about the confluence of these changes, folks.

I'm happy you were there, I was under the impression that routers had large buffers back then as well?

Not really. Yup, several of us were there. The common routers on the NSFNET and related networks were fuzzballs, which had 8 (count them, 8) 576 byte buffers, Cisco AGS/AGS+, and Proteon routers. The Cisco routers of the day generally had 40 buffers on each interface by default, and might have had configuration changes; I can't comment on the Proteon routers. For a 56 KBPS line, given 1504 bytes per message (1500 bytes IP+data, and four bytes of HDLC overhead), that's theoretically 8.5 seconds. But given that messages were in fact usually 576 bytes of IP data (cf "fuzzballs" and unix behavior for off-LAN communications) and interspersed with TCP control messages (Acks, SYNs, FINs, RST), real queue depths were more like two seconds at a bottleneck router. The question would be the impact of a sequence of routers all acting as bottlenecks.

IMHO, AQM (RED or whatever) is your friend. The question is what to set min-threshold to. Kathy Nichols (Van's wife) did a lot of simulations. I don't know that the paper was ever published, but as I recall she wound up recommending something like this:

line rate ms queue depth
   (MBPS) RED min-threshold
      2 32
     10 16
    155 8
    622 4
  2,500 2
10,000 1

I don't know if you are referring to the "RED in a different light" paper: that was never published, though an early draft escaped and can be found on the net.

"RED in a different light" identifies two bugs in the RED algorithm, and proposes a better algorithm that only depends on the link output bandwidth. That draft still has a bug.

The (almost completed) version of the paper that never got published; Van has retrieved it from back up, and I'm trying to pry it out of Van's hands to get it converted to something we can read today (it's in FrameMaker).

In the meanwhile, turn on (W)RED! For routers run by most people on this list, it's always way better than nothing, even if Van doesn't think classic RED will solve the home router bufferbloat problem. (where we have 2 orders of magnitude variation of wireless bandwidth along with highly variable workload). That's not true in the internet core.

But yes, I agree that we'd all be much helped if manufacturers of both ends of all links had the common decency of introducing a WRED (with ECN marking) AQM that had 0% drop probability at 40ms and 100% drop probability at 200ms (and linear increase between).

so, min-threshold=40 ms and max-threshold=200 ms. That's good on low speed links; it will actually control queue depths to an average of O(min-threshold) at whatever value you set it to. The problem with 40 ms is that it interacts poorly with some applications, notably voice and video.

It also doesn't match well to published studies like http://www.pittsburgh.intel-research.net/~kpapagia/papers/p2pdelay-analysis.pdf. In that study, a min-threshold of 40 ms would have cut in only on six a-few-second events in the course of a five hour sample. If 40 ms is on the order of magnitude of a typical RTT, it suggests that you could still have multiple retransmissions from the same session in the same queue.

A good photo of buffer bloat is at
       ftp://ftpeng.cisco.com/fred/RTT/Pages/4.html
       ftp://ftpeng.cisco.com/fred/RTT/Pages/5.html

The first is a trace I took overnight in a hotel I stayed in. Never mind the name of the hotel, it's not important. The second is the delay distribution, which is highly unusual - you expect to see delay distributions more like

       ftp://ftpeng.cisco.com/fred/RTT/Pages/8.html

Thanks, Fred! Can I use these in the general bufferbloat talk I'm working on with attribution? It's a far better example/presentation in a graphic form than I currently have for the internet core case (where I don't even have anything other than memory of probing the hotel's ISP's network).

(which actually shows two distributions - the blue one is fairly normal, and the green one is a link that spends much of the day chock-a-block).

My conjecture re 5.html is that the link *never* drops, and at times has as many as nine retransmissions of the same packet in it. The spikes in the graph are about a TCP RTO timeout apart. That's a truly worst case. For N-1 of the N retransmissions, it's a waste of storage space and a waste of bandwidth.

AQM is your friend. Your buffer should be able to temporarily buffer as much as an RTT of traffic, which is to say that it should be large enough to ensure that if you get a big burst followed by a silent period you should be able to use the entire capacity of the link to ride it out. Your min-threshold should be at a value that makes your median queue depth relatively shallow. The numbers above are a reasonable guide, but as in all things, YMMV.

Yup. AQM is our friend.

And we need it in many places we hadn't realised we did (like our OS's).
- Jim

Baker_Fred · December 22, 2010, 5:14pm

I don't know if you are referring to the "RED in a different light" paper: that was never published, though an early draft escaped and can be found on the net.

Precisely.

"RED in a different light" identifies two bugs in the RED algorithm, and proposes a better algorithm that only depends on the link output bandwidth. That draft still has a bug.

The (almost completed) version of the paper that never got published; Van has retrieved it from back up, and I'm trying to pry it out of Van's hands to get it converted to something we can read today (it's in FrameMaker).

In the meanwhile, turn on (W)RED! For routers run by most people on this list, it's always way better than nothing, even if Van doesn't think classic RED will solve the home router bufferbloat problem. (where we have 2 orders of magnitude variation of wireless bandwidth along with highly variable workload). That's not true in the internet core.

But yes, I agree that we'd all be much helped if manufacturers of both ends of all links had the common decency of introducing a WRED (with ECN marking) AQM that had 0% drop probability at 40ms and 100% drop probability at 200ms (and linear increase between).

so, min-threshold=40 ms and max-threshold=200 ms. That's good on low speed links; it will actually control queue depths to an average of O(min-threshold) at whatever value you set it to. The problem with 40 ms is that it interacts poorly with some applications, notably voice and video.

It also doesn't match well to published studies like http://www.pittsburgh.intel-research.net/~kpapagia/papers/p2pdelay-analysis.pdf. In that study, a min-threshold of 40 ms would have cut in only on six a-few-second events in the course of a five hour sample. If 40 ms is on the order of magnitude of a typical RTT, it suggests that you could still have multiple retransmissions from the same session in the same queue.

A good photo of buffer bloat is at
      ftp://ftpeng.cisco.com/fred/RTT/Pages/4.html
      ftp://ftpeng.cisco.com/fred/RTT/Pages/5.html

The first is a trace I took overnight in a hotel I stayed in. Never mind the name of the hotel, it's not important. The second is the delay distribution, which is highly unusual - you expect to see delay distributions more like

      ftp://ftpeng.cisco.com/fred/RTT/Pages/8.html

Thanks, Fred! Can I use these in the general bufferbloat talk I'm working on with attribution? It's a far better example/presentation in a graphic form than I currently have for the internet core case (where I don't even have anything other than memory of probing the hotel's ISP's network).

Yes. Do me a favor and remove the name of the hotel. They don't need the bad press.

George_Bonser · December 22, 2010, 7:03pm

I don't know if you are referring to the "RED in a different light"
paper: that was never published, though an early draft escaped and can
be found on the net.

"RED in a different light" identifies two bugs in the RED algorithm,
and
proposes a better algorithm that only depends on the link output
bandwidth. That draft still has a bug.

I also noticed another paper published later that references "RED in a
different light":

http://www.icir.org/floyd/adaptivered/

Adaptive RED: An Algorithm for Increasing the Robustness of RED's Active
Queue Management (postscript, PDF).
Sally Floyd, Ramakrishna Gummadi, and Scott Shenker.
August 1, 2001.

And this one:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.1556&rep=rep
1&type=pdf

July 15, 2002

Active Queue Management using Adaptive RED
Rahul Verma, Aravind Iyer and Abhay Karandikar
Abhay

But it doesn't look like aRED went anywhere

Carsten_Bormann · December 23, 2010, 6:00pm

Some more historical pointers:

If you want to look at the early history of the latency discussion,
look at Stuart Cheshire's famous rant "It's the Latency, Stupid"
(http://rescomp.stanford.edu/~cheshire/rants/Latency.html). Then look
at Matt Mathis's 1997 TCP equation (and the 1998 Padhye-Firoiu version
of that): The throughput is proportional to the inverse square root of
the packet loss and the inverse RTT -- so as the RTT starts growing
due to increasing buffers, the packet loss must grow to keep
equilibrium!

We started to understand that you have to drop packets in order to
limit queueing pretty well in the late 1990s. E.g., RFC 3819 contains
an explicit warning against keeping packets for too long (section 13).

But, as you notice, for faster networks, the bufferbloat effect can be
limited in effect by intelligent window size management, but the
dominating Windows XP was not intelligent, just limited in its widely
used default configuration. So the first ones to fully see the effect
were the ones with many TCP connections, i.e. Bittorrent. The modern
window size "tuning" schemes in Windows 7 and Linux break a lot of
things -- you are just describing the tip of the iceberg here. The
IETF working group LEDBAT (motivated by the Bittorrent observations)
has been working on a scheme to run large transfers without triggering
humungous buffer growth.

Gruesse, Carsten