Question about propagation and queuing delays

David_Hagel · August 21, 2005, 11:13pm

I was wondering what are the typical coast-to-coast propagation and
queuing delays observed by today's backbone networks in North America.
Is there any data/study which provides a breakdown of different
components of such end-to-end delays in today's backbone networks?

Thanks,
David

Richard_A_Steenbegen · August 22, 2005, 12:18am

Well that all depends on the routes in question doesn't it. Propagation is
the key factor on longhaul networks, but there are many economic and
technological decisions that tend to work against laying fiber in straight
lines, such as the need to hit as many large cities along the way, those
pesky right-of-ways, and the cost of optimized routes with lower capacity
vs less optimized routes with most capacity.

Lets take one major example, the "cross country" path between the two most
important Internet locations in the US, the San Jose area in California,
and the northern Virginia area near Washington DC.

The best path out there seems to be a nearly direct route from Sprint,
coming in at around 61-62ms rtt. This is a relatively rare route however,
not taken by most networks. The "much more common" optimized version is
around 67-68ms, which tends to route via paths like DC or Philadelphia,
via Cleveland, Chicago, (St. Louis or the northern loop through
Minneapolis), Kansas City, Denver, Sacramento, San Francisco...

The path gets worse when there is no direct route between the mid-atlantic
and Chicago, usually resulting in DC, New York, Boston, Buffalo,
Cleveland... This tends to kick it up to around 72ms on the good side,
75ms on the bad side. It can also get worse when the direct path between
Denver and California (Sacramento often) does not exist, resulting in
paths via Los Angeles or Seattle, and around 78ms-80ms rtt.

Above 80ms it stops becoming acceptable, and is only doable by a network
which is missing some key route that most people seem to have. Of course
is is also possible to come in with a better southern crosscountry path,
via something like DC, Atlanta, (New Orleans and Houston, or paths via
Arkansas), Phoenix, and Los Angeles. A reasonably optimized version of
this route tends to weigh in at around 61-62ms, plus another 9-10ms from
Los Angeles to San Jose.

If you take a detailed look at fiber routes from the big 3 (Level 3, WCG,
and the GX/Qwest builds) you can get a pretty good idea of what paths are
out there, and the differences between them. Of course if someone were to
come along and combine optimal segments from all of them you would end up
with a network far superior to anything currently available, but the
practical cost under current economical conditions would just be absurd.
Ironically, the cost of linking the different carriers who are in
different buildings within the same metro area in all the cities necessary
is probably right up there with the cost of the longhaul itself.

Bottom line about latency, gamers don't pay the bills.

David_Hagel · August 22, 2005, 1:38am

Richard,

Thanks for the highly informative answer.

Would there be any data out there on what fraction from this 60ms to
80ms RTT is raw propagation delay and what fraction is typical packet
queuing delay at intermediate switches? Does queuing delay play much
of a role at all these days? Or is it all just propagation delay?

Thanks,
- Dave

Valdis_Kletnieks · August 22, 2005, 2:00am

Measure the distance, figure the signal is moving at 60% of the speed
of light, and anything over that is queueing/switching delay.

This of course assumes that you have some idea how the cable runs....

Rob_Seastrom2 · August 22, 2005, 5:17am

David Hagel <david.hagel@gmail.com> writes:

Would there be any data out there on what fraction from this 60ms to
80ms RTT is raw propagation delay and what fraction is typical packet
queuing delay at intermediate switches? Does queuing delay play much
of a role at all these days? Or is it all just propagation delay?

With any kind of reasonably fast circuit and modern routers, you may
safely ignore queuing delay. The two following rules of thumb apply:

Queueing delay (time the packet sits in memory waiting to get clocked
out the port) is insignificant when the circuit is not more than "kind
of full". The cutoff point for "kind of full" ranges from 60% to >97%
full as circuit speed increases from DS1 to OC48.

Clocking delay (time it takes to put the packet on the wire) takes 7
milliseconds for a 1500 byte packet on a DS1, 266 microseconds on a
DS3, and correspondingly less on faster circuits. Again,
insignificant in the context of a transcontinental link.

You may find Peter Lothberg's presentation at NANOG 22 enlightening.
Check out the slides at http://www.nanog.org/mtg-0105/lothberg.html
especially the queueing delay graphs starting at 9:30 into the
presentation.

---Rob

David_Hagel · August 22, 2005, 3:14pm

This is interesting. This may sound like a naive question. But if
queuing delays are so insignificant in comparison to other fixed delay
components then what does it say about the usefulness of all the
extensive techniques for queue management and congestion control
(including TCP congestion control, RED and so forth) in the context of
today's backbone networks? Any thoughts? What do the people out there
in the field observe? Are all the congestion control researchers out
of touch with reality?

- Dave

Petri_Helenius · August 22, 2005, 3:29pm

David Hagel wrote:

This is interesting. This may sound like a naive question. But if
queuing delays are so insignificant in comparison to other fixed delay
components then what does it say about the usefulness of all the
extensive techniques for queue management and congestion control
(including TCP congestion control, RED and so forth) in the context of
today's backbone networks? Any thoughts? What do the people out there
in the field observe? Are all the congestion control researchers out
of touch with reality?

Co-operative congestion control is like many other things where you're better off without it if most of "somebody else" is using it. TCP does not give you optimal performance but tries to make sure everybody gets along.

Pete

Eric_A_Hall · August 22, 2005, 3:32pm

Latency is cumulative. Knocking a little time off Part A will still act to
shorten total time, regardless of the time occupied by Part B

Queuing behaviors are also significant when you are suffering congestion,
apart from the delay factors

Andre_Oppermann · August 22, 2005, 3:41pm

David Hagel wrote:

This is interesting. This may sound like a naive question. But if
queuing delays are so insignificant in comparison to other fixed delay
components then what does it say about the usefulness of all the
extensive techniques for queue management and congestion control
(including TCP congestion control, RED and so forth) in the context of
today's backbone networks? Any thoughts? What do the people out there
in the field observe? Are all the congestion control researchers out
of touch with reality?

Queueing is only ever being used when there is something to queue.
In the optical backbones of today this is seldomly the case and all
operators are busy telling you there is always excessive bandwidth
available on theirs. It gets used whenever a downward speedchange
happens: 1Gig -> 100M for example.

ianai · August 22, 2005, 3:41pm

I think the key here is "when you are suffering congestion".

RS said that queueing delay is irrelevant when the link was between 60% and > 97% full, depending on the speed of the link. If you have a link which is more full than that, queueing techniques matter.

Put another way, queueing techniques are irrelevant when the queue size is almost always <= 1.

Tony_Finch · August 22, 2005, 4:23pm

TCP performs much better if queueing delays are short, because that
means it gets feedback from packet drops more promptly, and its RTT
measurements are more accurate so the retransmission timeout doesn't get
artificially inflated.

Tony.

Petri_Helenius · August 22, 2005, 4:30pm

Tony Finch wrote:

TCP performs much better if queueing delays are short, because that
means it gets feedback from packet drops more promptly, and its RTT
measurements are more accurate so the retransmission timeout doesn't get
artificially inflated.

Sure, but sending speculative duplicate ack's if you're competing seriously of transit bandwidth works even better...
Not sure how to set the evil bit on the packets though...

Pete

Andre_Oppermann · August 22, 2005, 4:40pm

Tony Finch wrote:

Richard_A_Steenbegen · August 22, 2005, 5:56pm

Queueing only matters if you are a) congested, or b) have a really slow
circuit.

On a 33.6k modem, the delay to serialize a 1500 byte packet is something
like 450ms. During the transmission, the pipe is effectively locked,
causing an instantaneous congestion. You can not transmit anything else
until that block of data is completed, even a small/quick packet like say
an interactive SSH session. This makes interactive sessions (or chatty
protocols) painfully slow.

During this time, there are more packets piling up in the queue, including
potentially more large packets which will lock the pipe up for even
longer. Intelligent queueing can transmit the smaller/quicker packets
already in the queue first, optimizing your interactive sessions in the
face of high serialization and queueing delays.

This is still fairly noticable on a T1, but up much above that it becomes
pretty insignificant. If you have a good eye and a good internal timer on
your OS you can spot the difference between a FastE and a GigE on your
local network ping times (usually around 0.2 to 0.3ms difference). By the
time you get to GigE and beyond we're talking microseconds.

Of course the other reason for queue management technologies, like RED, is
to provide better handling in the face of congestion. On a large circuit
with many thousands of TCP flows going across of it, each acting
independantly, tail drop in the face of congestion will tend to cause the
TCP flows to synchronize their congestion control attempts, resulting in
periods where they will all detect congestion and dive off, then attempt
to scale back up and all beat the hell out of the circuit simultaniously,
rinse wash repeat.

Obviously this is all bad, and a little application of technology can help
you squeeze a bit more life out of a congested pipe (preventing the queue
and thus the latency from shooting skyward and/or bouncing around all over
the place as soon as the pipe starts to get congested), but it really has
nothing to do with what you perceive as "latency" on a normal-state,
modern, non-congested, long-distance Internet backbone network. Of course,
your congested cable modem with 30ms of jitter even during "normal"
operation trying to get to you from down the street doesn't fit the same
model.

Joe_Abley3 · August 22, 2005, 5:58pm

Most networks I have touched that have seen fit to deploy some kind of "quality of service" mechanism have done so in order to deliberately degrade service in inverse proportion to what people are prepared to spend. This is somewhat contrary to the marketing message, since "pay us more money and we'll wreck your performance less!" is unlikely to win awards as a slogan, but it happens nonetheless.

Examples are DSL users who are rate-limited down to modem speeds after their leeching budget for the month has been exhausted, and gigabit-attached customers whose traffic is squeezed as it is carried over expensive bits of network (e.g. bits that cross oceans). These may be more common in regions with expensive external paths that are used by a high proportion of traffic (e.g. small, English-speaking countries in the Pacific rim) than it is in North America.

In North America, the usual contention I have seen in backbones is a lack of external capacity towards particular peers; the answer there is usually traffic engineering rather than queue management, although I've seen WRED turned on as a short-term measure to make the helpdesk phone ring less while more OC12s are turned up.

One last wave of the hands: just because the backbone is clear and free, and rarely needing to queue a packet, doesn't mean that one edge or another of a flow (or both) isn't competing with other traffic as part of a multi-access wireless network, oversubscribed back-haul from a DSLAM or a CATV network at 4pm in the winter when the neighbourhood kids come back from school.

Joe

Richard_A_Steenbegen · August 22, 2005, 6:15pm

Well, the reality is that there is no such thing as a "50% used" circuit.
A circuit is either 0% used (not transmitting) or 100% used (transmitting)
at any given moment, what we are really measuring is the percentage of
times the circuit was being utilized over a given time period (as in
"bits per second").

If you want to send a packet, and the circuit is being utilized, you get
shoved into a queue. If the circuit is so slow that serialization delays
are massive (aka 500ms until your packet gets transmitted), you're going
to notice it. If the serialization delay is microseconds, you're probably
not going to care, as the packet is going to be on its merry way "soon
enough".

Now say you've got a packet coming in and waiting to be transmitted out an
interface. Below "50% utilized", the odds are pretty low that you're going
to hit much of anything in the queue at all. Between around "60% utilized"
to "97% utilized" the chances of hitting something in the queue start to
creep up there, but on a reasonably fast circuit this is still barely
noticable (less than a milisecond of jitter), not enough to get noticed in
any actual application. As you start to get above that magic "97%
utilized" number (or whatever it actually is, I forget offhand) the odds
of hitting something in the queue before you start becoming really really
good. At that point, the queue starts growing very quickly, and the
latency induced by the queueing delays starts skyrocketing. This continues
until either a) you exhaust your queue and drop packets (called "tail
drop", when you blindly drop whatever there isn't room for in the queue),
or b) otherwise force the flow control mechanisms of the higher level
protocols (like TCP) to slow down.

Plenty of folks who enjoy math have done lots of research on the subject,
and there are lots of pretty graphs out there. Perhaps someone will come
up with a link to one.

Richard_A_Steenbegen · August 22, 2005, 6:27pm

All QoS implies is some level of intelligence in determining who is going
to get the boot first when the @#$% hits the fan. Pay us more money (or
otherwise be considered more "important", perhaps be one of those lovely
"tracerouute" or "ping" packets :P) and we'll make someone else's packets
drop before yours.

It may not be pretty, but sometimes there are some packets that you are
more willing to sacrifice than others. Of course it is probably better if
you just manage to steer your @#$% clear of the fan (where QoS = Quantity
of Service), and yes networks who spend a significant amount of time
dropping packets rather than carrying them (even if find creative ways to
do it so that noone notices) are worse than networks who have capacity.
But @#$% does happen even to the best of us, my take is that there is no
point being so macho about it that you won't use a little technology to
reduce the pain when it does happen.

Iljitsch_van_Beijnum · August 23, 2005, 11:56am

The answer is that delay is only one aspect of performance, another important one is packet loss. As link bandwidth increases, queuing delays decrease proportionally. So if you're using your 10 Mbps link with average 500 byte packets at 98% capacity, you'll generally have a 49-packet queue. (queue = utilization / (1 - utilization)) Our 500 byte packets are transmitted at 0.4 ms intervals, so that makes for a 19.6 ms queuing delay.

So now we increase our link speed to 100 Mbps, but for some strange reason this link is also used at 98%. So the average queue size is still 49 packets, but it now only takes 0.04 ms to transmit one packet, so the queuing delay is only 1.96 ms on average.

As you can see, as bandwidth increases, queuing delays become irrelevant. To achieve even 1 ms queuing delay (that's only 120 miles extra fiber) at 10 Gbps you need an average queue size of 833 even with 1500-byte packets. For this, you need a link utilization of almost 99.9%.

However, due to IP's bursty nature the queue size is quite variable. If there is enough buffer space to accommodate whatever queue size that may be required due to bursts, this means you get a lot of jitter. (And, at 10 Gbps, expensive routers, because this memory needs to be FAST.) On the other hand, if the buffer space fills up but packets keep coming in faster than they can be transmitted, packets will have to be dropped. As explained by others, this leads to undesired behavior such as TCP congestion synchronization when packets from different sessions are dropped and poor TCP performance when several packets from the same session are dropped. So it's important to avoid these "tail drops", hence the need for creative queuing techniques.

However, at high speeds you really don't want to think about this too much. In most cases, your best bet is RED (random early detect/drop) which gradually drops more and more packets as the queue fills up (important: you need to have enough buffer space or you still get excessive tail drops!) so TCP sessions are throttled back gradually rather than traumatically. Also, the most aggressive TCP sessions are the most likely to see dropped packets. With weighted RED some traffic gets a free pass up to a point, so that's nice if you need QoS "guarantees". (W)RED is great because it's not computationally expensive and only needs some enqueuing logic but no dequeuing logic.