Extreme Slowness

Elijah_Savage · October 26, 2006, 8:24pm

Looks like level3 is having issues. Anyone know what is going on?

Brandon_Galbraith · October 26, 2006, 8:30pm

Can you be more specific?

-brandon

Elijah_Savage · October 26, 2006, 8:34pm

It seems anything traversing level3 has very high latency along with what seems overloaded capacity as if they are running in a degraded mode I have connections with Time Warner, AT&T, and MCI. Though I know it is not concrete it seems as if something is going on according to this http://www.internetpulse.net/

Elijah_Savage · October 26, 2006, 8:48pm

Say like this traceroute. This is from TW to a Broadwing DS3.

5 tenge-3-2.car1.Cincinnati1.Level3.net (4.78.216.13) 153.267 ms 207.125 ms
tenge-3-1.car1.Cincinnati1.Level3.net (4.78.216.9) 218.920 ms
6 ae-5-5.ebr2.Chicago1.Level3.net (4.69.132.206) 36.976 ms 26.923 ms 57.770 ms
7 ge-11-0.core2.Chicago1.Level3.net (4.68.101.37) 254.145 ms
ge-11-1.core2.Chicago1.Level3.net (4.68.101.101) 258.522 ms
ge-11-2.core2.Chicago1.Level3.net (4.68.101.165) 227.223 ms
8 broadwing-level3-oc12.Chicago1.Level3.net (209.0.225.10) 231.451 ms
9 so-1-1-0.c1.gnwd.broadwing.net (216.140.15.1) 53.269 ms 35.568 ms 22.511 ms
10 216.140.14.17 (216.140.14.17) 34.751 ms 39.008 ms 46.644 ms
11 p5-0-0.e0.cncn.broadwing.net (216.140.15.78) 32.065 ms 60.797 ms 54.766 ms
12 67.98.17.122 (67.98.17.122) 44.772 ms 27.631 ms 30.655 ms
13 * * *

Elijah_Savage · October 26, 2006, 9:14pm

Here is one from that browdwing ds3 to MCI well Verizon now.

5 tenge-3-1.car1.Cincinnati1.Level3.net (4.78.216.9) 157.795 ms 179.050 ms
tenge-3-2.car1.Cincinnati1.Level3.net (4.78.216.13) 205.087 ms
6 * * ae-5-5.ebr2.Chicago1.Level3.net (4.69.132.206) 50.134 ms
7 * ae-1-100.ebr1.Chicago1.Level3.net (4.69.132.41) 45.873 ms *
8 ae-2.ebr2.NewYork1.Level3.net (4.69.132.66) 66.346 ms 72.509 ms *

Elijah_Savage · October 26, 2006, 10:01pm

Seems to be all cleared up now. I had a couple of my customers even try to pull up their home site and could not get to it.

For FYI I realize that ICMP is not the best way to test and it is not a true indication of slowness or the presence of a problem.

Aaron_Glenn · October 26, 2006, 11:11pm

Uhh, you do realize the end to end latency there (to hop 12, at least)
is ~30ms...not the 250ms+ you see on intermediate hops, right?

Elijah_Savage · October 26, 2006, 11:28pm

Yes sir I did. This is now resolved. But thank you for noticing.

Jeremy_Chadwick · October 26, 2006, 11:55pm

Which begs the same question I've asked in the recent past: then
what *is* a good diagnostic tool? If ICMP "is not the best way to
test", then what is? What other globally-implemented layer 3 or
below protocols do we have available for troubleshooting?

Sure, UDP-based traceroute still relies on ICMP TTL exceeded
responses to work. I've no idea what TCP traceroute relies on,
as I haven't looked at it.

Jim_Popovitch3 · October 27, 2006, 2:29am

Two questions for everybody...(any and all responses appreciated, even
if the reply mentions botnets or hammers )

1) What value is ICMP if everybody pretty much considers it's accuracy
suspect?

2) How does ICMP's suspect nature affect Path MTU?

-Jim P.

Bandy_Rush1 · October 27, 2006, 3:06am

1) What value is ICMP if everybody pretty much considers it's accuracy
suspect?

because for some uses, narrow precision is not needed. like is it
pingable? what is the current path?

my eyes are not highly accurate at measuring distance, color, size,
motion, ... accurately. but i'll keep them, thanks.

2) How does ICMP's suspect nature affect Path MTU?

pmtu is hosed for other sicker reasons

randy

Adam_Rothschild · October 27, 2006, 4:22am

Elijah,

[HTML mail stripped]

It seems anything traversing level3 has very high latency along with
what seems overloaded capacity as if they are running in a degraded
mode I have connections with Time Warner, AT&T, and MCI [...]

[HTML mail stripped]

Say like this traceroute. This is from TW to a Broadwing DS3.

5 tenge-3-2.car1.Cincinnati1.Level3.net (4.78.216.13) 153.267 ms
207.125 ms
    tenge-3-1.car1.Cincinnati1.Level3.net (4.78.216.9) 218.920 ms
6 ae-5-5.ebr2.Chicago1.Level3.net (4.69.132.206) 36.976 ms 26.923
ms 57.770 ms
7 ge-11-0.core2.Chicago1.Level3.net (4.68.101.37) 254.145 ms
    ge-11-1.core2.Chicago1.Level3.net (4.68.101.101) 258.522 ms
    ge-11-2.core2.Chicago1.Level3.net (4.68.101.165) 227.223 ms
8 broadwing-level3-oc12.Chicago1.Level3.net (209.0.225.10) 231.451 ms
9 so-1-1-0.c1.gnwd.broadwing.net (216.140.15.1) 53.269 ms 35.568
ms 22.511 ms

Your postings appear to be missing two key pieces of information which
would help with the community diagnosis requested: source and
destination IP addresses. From the information you did provide, one
can deduce that you're behind a TW/RoadRunner cable modem:

  13.216.78.4.IN-ADDR.ARPA domain name pointer tenge-3-2.car1.Cincinnati1.Level3.net
  14.216.78.4.IN-ADDR.ARPA domain name pointer ROADRUNNER.car1.Cincinnati1.Level3.net
  9.216.78.4.IN-ADDR.ARPA domain name pointer tenge-3-1.car1.Cincinnati1.Level3.net
  10.216.78.4.IN-ADDR.ARPA domain name pointer ROADRUNNER.car1.Cincinnati1.Level3.net

Now, the jitter and high latency you're seeing could be a result of
one or more factors, including but not limited to RF/plant issues, TWC
running their transport and/or Level(3) transit hot (which seems to be
a common occurrence these days), ECMP across two circuits of uneven
loading, or your neighbor might be jacking wifi and downloading a
bunch of torrents -- we, the readers, just don't know.

Of note when performing armchair troubleshooting across Level(3)'s
network: the 'ebr's (PTR record of ebr*.{pop}.level3.net == Force10
E1200; Experimental Backbone Router?) tend to drop a lot of diagnostic
traffic (such as, say, 'ping' and 'traceroute') as a part of overly
aggressive control-plane policers. This loss is, of course, strictly
cosmetic, and has no bearing on end-to-end performance. Hence, the
old "to it, not through it" rule applies.

smokeping[1] and iperf[2] (to end hosts) are your friends.

As an aside, I've noticed your string of postings today were all
HTML-tagged. While not expressly forbidden (or even discouraged) by
the current Mailing List AUP, this is generally regarded as bad form;
you might wish to reconfigure your mail client accordingly...

Hope this helps,
-a

[1] <http://oss.oetiker.ch/smokeping/>
[2] <NLANR -- National Laboratory for Applied Network Research;

michael.dillon1 · October 27, 2006, 10:13am

Which begs the same question I've asked in the recent past: then
what *is* a good diagnostic tool? If ICMP "is not the best way to
test", then what is? What other globally-implemented layer 3 or
below protocols do we have available for troubleshooting?

Sure, UDP-based traceroute still relies on ICMP TTL exceeded
responses to work. I've no idea what TCP traceroute relies on,
as I haven't looked at it.

I love it when people answer their own questions
and tell us that they are lazy, to boot.

For the record, TCP traceroute and similar TCP based
tools rely on the fact that if you send a TCP SYN
packet to a host it will respond with either a
TCP RST (if the port is NOT listening) or a TCP
SYN/ACK. The round trip time of this provides useful
information which is unaffected by any ICMP chicanery
on the part of routers or firewalls. A polite application
such as TCP traceroute will reply to the SYN/ACK with
an RST packet so it is reasonably safe to use this tool
with live services.

Of course, even TCP packets can be blocked or dropped
for various reasons so this is not a 100% solution.
However, if you want to avoid ICMP filtering or low
precedence, then TCP traceroute will help.

--Michael Dillon

Mikael_Abrahamsson · October 27, 2006, 10:25am

Intermediate nodes are still discovered by "ICMP TTL Exceeded in transit" just like UDP based traceroute, ie the outgoing TCP SYN packet has a low TTL.

So yes, tcptraceroute is good for getting thru firewalls in the forward direction, but intermediate routers are discovered in the same way by you getting an ICMP back because the TTL ran out.

Elijah_Savage · October 27, 2006, 11:37am

Adam,

Because of contractual issues it makes it very hard for me to participate on this list hence the vague original post. I was just asking a general question to see if anyone else was having issues. I have peering points with Broadwing(now level3), Sprint, AT&T and MCI(now Verizon) that I can test for throughput from. This was not just about home cable connectivity though when frontline starts to get calls I often use wget (very low overhead) to test throughput between my sites or to home my home box often times simulating the same sort of connectivity that a customer may have. There were customers that could not even get to level3.net yesterday which is their home page, but it is always nice to get the refresher course on ICMP though :).

As for html posted messages truly my mistake I know better and thank you for mentioning it. The new duo core 2 mac mail client which I am still trying to get use to under preferences says it is set to plain text hmmm something I need to look into.

Thank you

Florian_Weimer · October 27, 2006, 12:05pm

* Jim Popovitch:

Two questions for everybody...(any and all responses appreciated, even
if the reply mentions botnets or hammers )

1) What value is ICMP if everybody pretty much considers it's accuracy
suspect?

The problem with ICMP-based traceroutes is that it doesn't necessarily
test the path you are interested in. Use tcptraceroute or traceproto
instead.

Of course, this doesn't solve the problem that TTL Exceeded messages
might be generated with very low priority, which is in generally a
very good idea.

2) How does ICMP's suspect nature affect Path MTU?

In this case, you're interested in the ICMP payload, not the fact
whether an ICMP packet goes through or not. (You lose if someone
filters ICMP, though.)

Kevin_Hunt · October 27, 2006, 12:18pm

We peer with UUnet and Telcove (now L3 and being "assimulated")
Latency across Telcove has been terrible (not just routers on the path with higher than norm latency, but latency all the way to the endpoint)
I have personal opinions as to why the latency is so bad, but until I can prove something I'd rather not say anything in public.
Some examples : 72.30.33.194 is 60.8 ms away via uunet, it is 109ms away via Telcove. www.level3.com via uunet is 30ms away, via Telcove it is 55ms away.

Trace to level3.com via UUNet

Hostname %Loss Rcv Snt Last Best Avg Worst
  1. ndcr3-52.datasync.net 0% 6 6 0 0 0 0
  2. ndcr6-ndcr3.datasync.net 0% 5 5 0 0 0 0
  3. POS1-2.GW4.NOL1.ALTER.NET 0% 5 5 1 0 1 1
  4. 501.at-0-0-0.XL2.NOL1.ALTER.NET 0% 5 5 1 1 1 1
  5. 0.so-6-2-0.XT1.DFW9.ALTER.NET 0% 5 5 14 14 15 15
  6. 0.so-6-0-0.BR6.DFW9.ALTER.NET 0% 5 5 15 14 14 15
  7. so-1-0-0.edge1.Dallas1.Level3.net 0% 5 5 16 15 16 17
  8. so-1-2-0.bbr1.Dallas1.Level3.net 0% 5 5 16 15 16 16
  9. ae-0-0.bbr2.Denver1.Level3.net 0% 5 5 83 29 41 83
10. ge-6-1.hsa1.Denver1.Level3.net 0% 5 5 35 29 31 35
11. 4.68.94.1 0% 5 5 30 30 30 32
12. www.Level3.com 0% 5 5 31 29 30 31

Via Telcove

  1. ndcr3-52.datasync.net 0% 5 5 0 0 0 0
  2. 64.66.101.89 0% 5 5 7 7 7 7
  3. 24.56.107.229 0% 4 4 20 20 20 20
  4. ???
  5. 24.56.107.94 0% 4 4 20 20 20 20
  6. ge-6-23.car1.Atlanta1.Level3.net 0% 4 4 143 20 51 143
  7. ae-1-51.bbr1.Atlanta1.Level3.net 0% 4 4 20 20 30 59
  8. as-0-0.bbr1.Denver1.Level3.net 0% 4 4 54 53 54 54
  9. ge-9-0.hsa1.Denver1.Level3.net 0% 4 4 55 54 54 55
10. 4.68.94.1 0% 4 4 55 54 55 55
11. www.Level3.com 0% 4 4 55 55 55 55

Adam_Rothschild · October 27, 2006, 3:54pm

Because of contractual issues it makes it very hard for me to
participate on this list hence the vague original post.

I can understand you might have various NDAs in place limiting what
you can and can't disclose.

Unfortunately, without full information, it is difficult to provide a
full and proper diagnosis.

I was just asking a general question to see if anyone else was
having issues.

See, therein lies the problem.

As pointed out in recent congressional testimony by the esteemed
Senator from Alaska, the internets are comprised of very many
tangled-up tubes. At any given time, something just isn't working.
Without source and destination IP addresses, it's difficult to
determine whether a problem is global in scope (entirely appropriate
for this list), or an end-user issue (inappropriate for this list,
though some folk may beg to differ :-), as suggested by your snippet
of 'traceroute' output -- and ultimately take corrective action.

I have peering points with Broadwing(now level3), Sprint, AT&T and
MCI (now Verizon) that I can test for throughput from.

This phraseology is also a bit confusing, though sadly, all too common
these days. Unless you're settlement-free, a better idea might be to
word this as "I buy transit from..." or perhaps more appropriately,
"My cable MSO buys transit from..."

This was not just about home cable connectivity though when
frontline starts to get calls I often use wget (very low overhead)
to test throughput between my sites or to home my home box often
times simulating the same sort of connectivity that a customer may
have. There were customers that could not even get to level3.net
yesterday which is their home page

Be that as it may, a little information would have helped greatly.
Had you said this sooner, and backed it up with some supporting data
such as IP addresses and perhaps 'wget' output, chances are we
wouldn't be having this discussion.

On the other hand, if you can't trust us, perhaps a better course of
action would be to open trouble tickets with your provider(s)...

-a