UUNET Routing issues

I'm tempted to call in and see if I can get
a grasp of the scope and nature of the problem.
But maybe it would be best if someone simply
posted a brief summary of what is publicly
known about the issue....to be followed by
resonable speculation peppered with some
wild speculation.

So far we've received notification of this from Verisign Global Registry,
Verisign Payment Services, Genuity Customer Care, NANOG, MSNBC, F.C., and
it's been mentioned on a few Web sites.

There still seem to be problems. Earlier today CHI->ATL was 2000ms. Now
it's improved to 1000ms.

9 0.so-5-0-0.XL2.CHI13.ALTER.NET (152.63.73.21) 24.466 ms 24.311 ms 24.382 ms
10 0.so-0-0-0.TL2.CHI2.ALTER.NET (152.63.68.89) 24.467 ms 24.349 ms 24.454 ms
11 0.so-3-0-0.TL2.ATL5.ALTER.NET (152.63.101.50) 1029.484 ms 1049.529 ms 1063.692 ms
12 0.so-7-0-0.XL4.ATL5.ALTER.NET (152.63.85.194) 1106.067 ms 1118.102 ms 1132.124 ms

Kevin

Once upon a time, sigma@smx.pair.com <sigma@smx.pair.com> said:

There still seem to be problems. Earlier today CHI->ATL was 2000ms. Now
it's improved to 1000ms.

9 0.so-5-0-0.XL2.CHI13.ALTER.NET (152.63.73.21) 24.466 ms 24.311 ms 24.382 ms
10 0.so-0-0-0.TL2.CHI2.ALTER.NET (152.63.68.89) 24.467 ms 24.349 ms 24.454 ms
11 0.so-3-0-0.TL2.ATL5.ALTER.NET (152.63.101.50) 1029.484 ms 1049.529 ms 1063.692 ms
12 0.so-7-0-0.XL4.ATL5.ALTER.NET (152.63.85.194) 1106.067 ms 1118.102 ms 1132.124 ms

We're a UUNet customer (we also have other connections), and we haven't
really seen any big problem today. We're connected to Atlanta, and I
see:

$ traceroute 152.63.73.21
traceroute to 0.so-5-0-0.XL2.CHI13.ALTER.NET (152.63.73.21): 1-30 hops, 38 byte packets
1 servers.hsvcore.hiwaay.net (208.147.154.33) 0.977 ms 0.977 ms 0.0 ms
2 500.Serial2-6.GW6.ATL5.ALTER.NET (65.208.82.61) 5.85 ms 5.86 ms 6.83 ms
3 178.at-6-0-0.XL4.ATL5.ALTER.NET (152.63.82.178) 6.83 ms (ttl=252!) 7.81 ms (ttl=252!) 7.81 ms (ttl=252!)
4 0.so-2-1-0.TL2.ATL5.ALTER.NET (152.63.85.229) 7.81 ms (ttl=251!) 6.83 ms (ttl=251!) 6.83 ms (ttl=251!)
5 0.so-5-3-0.TL2.CHI2.ALTER.NET (152.63.13.42) 22.4 ms (ttl=250!) 22.4 ms (ttl=250!) 21.4 ms (ttl=250!)
6 0.so-5-0-0.XL2.CHI13.ALTER.NET (152.63.73.21) 25.3 ms 27.3 ms 24.4 ms

Once upon a time, sigma@smx.pair.com <sigma@smx.pair.com> said:

There still seem to be problems. Earlier today CHI->ATL was 2000ms. Now
it's improved to 1000ms.

9 0.so-5-0-0.XL2.CHI13.ALTER.NET (152.63.73.21) 24.466 ms 24.311 ms 24.382 ms
10 0.so-0-0-0.TL2.CHI2.ALTER.NET (152.63.68.89) 24.467 ms 24.349 ms 24.454 ms
11 0.so-3-0-0.TL2.ATL5.ALTER.NET (152.63.101.50) 1029.484 ms 1049.529 ms 1063.692 ms
12 0.so-7-0-0.XL4.ATL5.ALTER.NET (152.63.85.194) 1106.067 ms 1118.102 ms 1132.124 ms

We're a UUNet customer (we also have other connections), and we haven't
really seen any big problem today. We're connected to Atlanta, and I
see:
<snip>

We haven't seen anything unusual on our UU circuit in PHX, either.

The only thing I've noticed is high latency between UUNet and Sprint (around 2 second latency) in at least one traffic exchange point between them, maybe more. Probably because of the diversion of traffic on UUNet's network.

Once upon a time, sigma@smx.pair.com <sigma@smx.pair.com> said:

There still seem to be problems. Earlier today CHI->ATL was 2000ms.
Now
it's improved to 1000ms.

9 0.so-5-0-0.XL2.CHI13.ALTER.NET (152.63.73.21) 24.466 ms 24.311 ms 24.382 ms
10 0.so-0-0-0.TL2.CHI2.ALTER.NET (152.63.68.89) 24.467 ms 24.349 ms 24.454 ms
11 0.so-3-0-0.TL2.ATL5.ALTER.NET (152.63.101.50) 1029.484 ms
1049.529 ms 1063.692 ms
12 0.so-7-0-0.XL4.ATL5.ALTER.NET (152.63.85.194) 1106.067 ms
1118.102 ms 1132.124 ms

We're a UUNet customer (we also have other connections), and we haven't
really seen any big problem today. We're connected to Atlanta, and I
see:
<snip>

We haven't seen anything unusual on our UU circuit in PHX, either.

--
Chris Adams <cmadams@hiwaay.net>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.

--
Matt Levine
@Home: matt@deliver3.com
@Work: matt@eldosales.com
ICQ : 17080004
AIM : exile
GPG : http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x6C0D04CF
"The Trouble with doing anything right the first time is that nobody
appreciates how difficult it was." -BIX

Vinny Abello
Network Engineer
Server Management
vinny@tellurian.com
(973)300-9211 x 125
(973)940-6125 (Direct)
PGP Key Fingerprint: 3BC5 9A48 FC78 03D3 82E0 E935 5325 FBCB 0100 977A

Tellurian Networks - The Ultimate Internet Connection
http://www.tellurian.com (888)TELLURIAN

Where are they diverting it to, the Moon (1.5 light seconds away) ?

Really - I have seen some multisecond latencies on network links we were
testing, and I always wondered how these could come to be.

The Juniper routers (it appears they are based on
the interface naming scheme) tend to have incredible buffering capabilities
as compared to the predecasors of the time. This allows a full link
to not drop packets and fully buffer them over a period of time.

  This obviously has ramifications when it relates to tcp timing
and when you go from having a 20ms rtt for a packet to 1000+ms. tcp
obviously will think that there is some loss.

  - Jared

Good question. Cisco routers use a default queue size of 40 packets. That
will give you a ~2 second delay on a 128 kbps line. I seem to remember
that during my tour of duty at UUNET we had slightly faster lines... But
that was back in the good old days when life was good.

At 155 Mbps you need 32 MB worth of buffer space to arrive at a delay like
this. I wouldn't put it past ATM vendors to think of this kind of
over-enthusiastic buffering as a feature rather than a bug.

Does anyone have any thoughts on optimum buffer sizes?

At 155 Mbps you need 32 MB worth of buffer space to arrive at a delay like
this. I wouldn't put it past ATM vendors to think of this kind of
over-enthusiastic buffering as a feature rather than a bug.

Vendor C sells packet memory up to 256M each way for a line card. Whether
this makes any sense depends obviously on your interfaces. Theoretically
it makes sense to be able to accommodate the number of flows you�re carrying
times the window size advertised by TCP. In live networks not too large
a percentage of the flows send data at maximum so one would expect to have
a few thousand "full" flows on a link at time. 64k window for thousand flows
would use 64M buffer memory. (not counting memory utilization inefficiencies)

If you go deeper into the equation and start to analyze how fast you�ll get the
packets
in anyway, the associated mathematics will require a significantly longer
presentation
which you�ll probably find easily by Google.

Pete

Thus spake "Iljitsch van Beijnum" <iljitsch@muada.com>

At 155 Mbps you need 32 MB worth of buffer space to arrive at a delay like
this. I wouldn't put it past ATM vendors to think of this kind of
over-enthusiastic buffering as a feature rather than a bug.

Traditionally, it's ATM switches that have tiny buffers and routers that have
excessive buffers. ATM networks have closed-loop feedback and ingress policing
mechanisms to handle this scenario; IP networks just throw buffers at the
problem and hope it works.

Does anyone have any thoughts on optimum buffer sizes?

The "correct" amount of buffer space for a link is equal to its bandwidth-delay
product. Unfortunately, this requires per-link testing and configuration on the
part of the operator, which is extremely rare.

S

Vendor C sells packet memory up to 256M each way for a line card. Whether
this makes any sense depends obviously on your interfaces.

Hm, even at 10 Gbps 256M would add up to a delay of something like 200 ms.
I doubt this is something customers like.

Don't forget TCP can handle either a long round trip time or packet loss
relatively well, but not both at the same time. So if you're doing that
much buffering you should make absolutely sure it's enough to get rid of
tail drops or TCP performance will be extremely poor.

Theoretically
it makes sense to be able to accommodate the number of flows you�re carrying
times the window size advertised by TCP.

Curious. Then the objective of buffering would be to absorb the entire
window for each TCP flow. Is this a good thing to do? That will only add
more delay, so TCP will use larger windows and you need more buffering...
Kind of an arms race between the routers and the hosts to see which can
buffer more data.

Also, well-behaved TCP implementations shouldn't send a full window worth
of data back to back. The only way I can see this happening is when the
application at the receiving end stalls and then absorbs all the data
buffered by the receiving TCP at once. But then the sending TCP should
initiate the congestion avoidance algorithm, IMO.

Under normal circumstances, the full window worth of data will be spread
out over the entire path with no more than two packets arriving back to
back at routers along the way (unless one session monopolizes a link).

Iljitsch

http://www.wired.com/news/technology/0,1282,55580,00.html

How and Why the Internet Broke
By Michelle Delio
9:35 a.m. Oct. 4, 2002 PDT

The Internet was very confused on Thursday.

But cyberspace hasn't gone senile. Those massive e-mail delays, slow
Internet connections and downed e-businesses were all caused by a software
upgrade that went horribly wrong at WorldCom's UUNet division, a large
provider network communications.

[...]

Curious. Then the objective of buffering would be to absorb the entire
window for each TCP flow. Is this a good thing to do? That will only add
more delay, so TCP will use larger windows and you need more buffering...
Kind of an arms race between the routers and the hosts to see which can
buffer more data.

You usually end up with 64k window with modern systems anyway. Hardly
anything uses window scaling bits actively. Obviously by dropping select packets
you can keep the window at a more moderate size. Doing this effectively would
require the box to regocnize flows which is not feasible at high speeds.
(unless you�re caspian sales person :slight_smile:

Also, well-behaved TCP implementations shouldn't send a full window worth
of data back to back. The only way I can see this happening is when the
application at the receiving end stalls and then absorbs all the data
buffered by the receiving TCP at once. But then the sending TCP should
initiate the congestion avoidance algorithm, IMO.

I didn�t want to imply that the packets would be back to back in the queue
but if you have a relatively short path with real latency in order of few tens
of milliseconds and introduce extra 1000ms to the path, you have a full window
of packets on the same queue. They will not be adjacent to each other but
they would be sitting in the same packet memory.

Under normal circumstances, the full window worth of data will be spread
out over the entire path with no more than two packets arriving back to
back at routers along the way (unless one session monopolizes a link).

This discussion started as a discussion of non-normal circumstances. Not sure
if the consensus is that congestion is non-normal. It�s very complicated
to agree on metrics that define a "normal" network. Most people consider
some packet loss normal and some jitter normal. Some people even accept
their DNS to be offline for 60 seconds every hour for a "reload" as normal.

Pete

OK. I'll bite - is it feasible if you're a caspian engineer? :wink:

OK. I'll bite - is it feasible if you're a caspian engineer? :wink:

Obviously, as most of the audience knows, it�s a function of the speed you want
to achieve, the number of flows you expect to be interested in and what you want
to do with the flows. Getting traffic split up in a few million flows and
maintaining
the flow cache and associated state and doing lookups in the the cache is not
too
hard. Doing anything more clever than switching packets (like scheduling which
one
goes next) across a large dataset has been unachievable challenge so far.
(at least at price points people want to pay)

It would have to be an earlier hour to walk trough if a design which would
combine
flow classification and CAM based scheduling would cut it, but I�m afraid of the
aliasing
contention killing the actual thing you�re trying to achieve. (service
quarantees)

Pete

>Kind of an arms race between the routers and the hosts to see which can
>buffer more data.

You usually end up with 64k window with modern systems anyway. Hardly
anything uses window scaling bits actively.

I also see ~17k a lot. I guess most applications don't need the extra
performance offered by the larger windows anyway.

Obviously by dropping select packets
you can keep the window at a more moderate size. Doing this effectively would
require the box to regocnize flows which is not feasible at high speeds.

I think random early detect works reasonably well. Obviously something
that really looks at the sessions would work better, but statistically,
RED should work out fairly well.

>Also, well-behaved TCP implementations shouldn't send a full window worth
>of data back to back. The only way I can see this happening is when the
>application at the receiving end stalls and then absorbs all the data
>buffered by the receiving TCP at once.

I didn�t want to imply that the packets would be back to back in the queue
but if you have a relatively short path with real latency in order of few tens
of milliseconds and introduce extra 1000ms to the path, you have a full window
of packets on the same queue. They will not be adjacent to each other but
they would be sitting in the same packet memory.

The only way this would happen is when the sending TCP sends them out back
to back after the window opening up after having been closed. Under normal
circumstances, the sending TCP sends out two new packets after each ACK.
Obviously ACKs aren't forthcoming if all the traffic is waiting in buffers
somewhere along the way. Only when a packet gets through an ack comes back
and a new packet (or two) is transmitted.

Hm, but a somewhat large number of packets being released at once by a
sending TCP could also happen as the slow start threshold gets bigger.
This could be half a window at once.

>Under normal circumstances, the full window worth of data will be spread
>out over the entire path with no more than two packets arriving back to
>back at routers along the way (unless one session monopolizes a link).

This discussion started as a discussion of non-normal circumstances. Not sure
if the consensus is that congestion is non-normal. It�s very complicated
to agree on metrics that define a "normal" network. Most people consider
some packet loss normal and some jitter normal. Some people even accept
their DNS to be offline for 60 seconds every hour for a "reload" as normal.

Obviously "some" packet loss and jitter are normal. But how much is
normal? Even at a few tenths of a percent packet loss hurts TCP
performance. The only way to keep jitter really low without dropping large
numbers of packets is to severly overengineer the network. That costs
money. So how much are customers prepared to pay to avoid jitter?

In any case, delays of 1000 ms aren't within any accepted definition of
"normal". With these delays, high-bandwidth batch applications will
monopolize the links and interactive traffic suffers. 20 ms worth of
buffer space with RED would keep those high-bandwidth applications in
check and allow a reasonable degree of interactive traffic. Maybe a
different buffer size would be better, but the 20 ms someone mentioned
seems as good a starting point as anything else.

After reading all the stories about what supposedly happened does
anyone know what really happened? Did UUNet US really do an IOS
upgrade on a sizable proportion of their border routers in one go?
This seems like suicide to me. What possible reason could there be for
a network-wide roll out of an untested IOS apart from being in the
mire already?

Tim

## On 2002-10-04 23:50 +0200 Iljitsch van Beijnum typed:

Obviously "some" packet loss and jitter are normal. But how much is
normal? Even at a few tenths of a percent packet loss hurts TCP
performance. The only way to keep jitter really low without dropping large
numbers of packets is to severely overengineer the network. That costs
money. So how much are customers prepared to pay to avoid jitter?

There may be better ways to keep "reasonable" jitter but that depends on
what is "really low" jitter - care to define numbers ?

In any case, delays of 1000 ms aren't within any accepted definition of
"normal".

Ever used a satellite link ?
Practical RTT("normal" - end to end including the local loops at both
sides) starts at about 600msec

>>> With these delays, high-bandwidth batch applications will

monopolize the links and interactive traffic suffers.

I'm assuming TCP since you didn't state otherwise
TCP extensions for "fat pipes"(such as window scaling and SACK) disabled
(as both sides of the TCP connection need to have them)

IIRC the maximum TCP(theoretical)session BW under these conditions
Is less than 1Mb/sec (for 600msec RTT)

   For a reality check you may want to have look at the links under
"Satellite links and performance" on
<http://www.internet-2.org.il/documents.html>
(yes the docs are a bit dated but the principles aren't)

> Obviously "some" packet loss and jitter are normal. But how much is
> normal? Even at a few tenths of a percent packet loss hurts TCP
> performance. The only way to keep jitter really low without dropping large
> numbers of packets is to severely overengineer the network. That costs
> money. So how much are customers prepared to pay to avoid jitter?

There may be better ways to keep "reasonable" jitter but that depends on
what is "really low" jitter - care to define numbers ?

I don't use applications that have jitter requirements, so I'm not in the
best position to comment on this. I'd say that with a line utilization of
50% or less, which leads to an average queue size of one packet or less,
jitter is "really low". If the level of jitter introduced here is too
high, then I don't think the application can successfully run over IP.

> In any case, delays of 1000 ms aren't within any accepted definition of
> "normal".

Ever used a satellite link ?
Practical RTT("normal" - end to end including the local loops at both
sides) starts at about 600msec

So then a satellite link with a 1000 ms delay wouldn't be normal, would
it?

>>> With these delays, high-bandwidth batch applications will
> monopolize the links and interactive traffic suffers.

I'm assuming TCP since you didn't state otherwise
TCP extensions for "fat pipes"(such as window scaling and SACK) disabled
(as both sides of the TCP connection need to have them)

IIRC the maximum TCP(theoretical)session BW under these conditions
Is less than 1Mb/sec (for 600msec RTT)

Ok, so "1 Mbps batch applications" will monopolize the links and
interactive traffic suffers.

IIRC the maximum TCP(theoretical)session BW under these conditions
Is less than 1Mb/sec (for 600msec RTT)

873.8kbps payload, add headers with assumed 1500 byte MTU and you'll have 897.8kbps.

This assumes zero latency on the hosts reacting to the packets.

Pete

Corporate culture is the hardest thing to change in a company. You'll need
to talk with your Worldcom account rep about what happened, and what
Worldcom intends to do about it. In the past, Worldcom has not been very
open or transparent when it has had network problems.