Possible explanations for a large hop in latency

Our upstream provider has a connection to AT&T (12.88.71.13) where I
relatively consistently measure with a RTT of 15 msec, but the next hop
(12.122.112.22) comes in with a RTT of 85 msec. Unless AT&T is sending that
traffic over a cable modem or to Europe and back, I can't see a reason why
there is a consistent ~70 msec jump in RTT. Hops farther along the route
are just a few msec more each hop, so it doesn't appear that 12.122.112.22
has some kind of ICMP rate-limiting.

Is this a real performance issue, or is there some logical explanation?

Frank

Deep Packet Inspection engine delay. <G>

When I asked ATT about the sudden latency jump I see in traceroutes, they told me it was due to how their MPLS network is setup.

--John

Frank Bulk wrote:

Did that satisfy you? I guess with MPLS they could tag the traffic and send
it around the country twice and I wouldn't see it at L3.

Frank

The explanation I got, was that the latency seen at the first hop was actually a reply from the last hop in the path across their MPLS network. Hence, all the following hops had very similar latency.

Personally, I thought it was rather strange for them to do that. And, I've never seen that occur on any other network.

Perhaps someone from ATT would like to chime in.

--John

Frank Bulk - iNAME wrote:

This is standard for MPLS, the ICMP TTL expire message is sent along the LSP and returned via the router at the end of the LSP.

They probably don't propagate TTL w/in their MPLS core. Depending on how
they have MPLS implemented, you may only see 2 hops on the network; the
ingress and egress routers. If the ingress router was in NYC and the egress
in Seattle, you could understandably expect a large jump in RTT.

Not an ATT customer but do know other providers run their MPLS core's this
way...

-Robert

Thanks for the added information.

Even if their MPLS path went from the midwest (where I'm located) to San
Francisco and then back to St. Louis (where 12.122.112.22 appears to be), I
don't think that accounts for a 70 msec jump in traffic. And I don't think
they would (intentionally) create such an inefficient MPLS path.

Someone off-list told me they tried to trace to 12.88.71.13, but once they
hit an AT&T router their ICMP traffic appeared to be blocked.

Frank

We had a similar situation going from Minneapolis to Kansas City via Chicago. Normal latency from Minneapolis to Chicago via Level3 MPLS network is about 14msec RTT. When the the circuit from Minneapolis to Chicago went out for one reason or another, our MPLS link went from Minneapolis to Tulsa, to Dallas, and then to Chicago.. That added a little latency in the path from Minneapolis to Chicago.. We didn't need to exceed the SLA in order to cry foul. They didn't intentionally create an inefficient path.. The problem was recognized and fixed the same day.

Latency on an MPLS circuit is the cumulative latency on the Label Switch Path, and a number of the hops are invisible. The latency per hop is still the same... you just can't see that your traffic is travelling to say Denver or Dallas.

Tim Peiffer
Network Support Engineer
Networking and Telecommunications Services
University of Minnesota/NorthernLights GigaPOP

Frank Bulk - iNAME wrote:

Interestingly enough, when I trace from my Cisco router it seems to show
some MPLS labels after the hop of interest (12.88.71.13 to 12.122.112.78,
only 24 msec here!). I'm not sure how our Cisco box derives these from a
foreign network.

Router#traceroute 69.28.226.193

Type escape sequence to abort.

Tracing the route to 69.28.226.193

  1 sxct.sxcy.mtcnet.net (167.142.156.197) 0 msec 0 msec 0 msec

  2 siouxcenter.sxcy.137.netins.net (167.142.180.137) 4 msec 4 msec 4 msec

  3 ins-b12-et-4-0-112.desm.netins.net (167.142.57.106) 8 msec 8 msec 8 msec

  4 ins-h2-et-1-10-127.desm.netins.net (167.142.57.129) 8 msec 8 msec 8 msec

  5 ins-c2-et-pc2-0.desm.netins.net (167.142.57.142) 8 msec 8 msec 8 msec

  6 12.88.71.13 28 msec 24 msec 28 msec

  7 tbr2.sl9mo.ip.att.net (12.122.112.78) [MPLS: Label 30663 Exp 0] 52 msec
48 msec 52 msec

  8 cr2.sl9mo.ip.att.net (12.122.18.69) [MPLS: Label 17306 Exp 0] 52 msec 52
msec 52 msec

  9 cr2.cgcil.ip.att.net (12.122.2.21) [MPLS: Label 16558 Exp 0] 52 msec 52
msec 52 msec

10 cr1.cgcil.ip.att.net (12.122.2.53) [MPLS: Label 17002 Exp 0] 48 msec 52
msec 52 msec

11 cr1.n54ny.ip.att.net (12.122.1.189) [MPLS: Label 17033 Exp 0] 52 msec 52
msec 48 msec

12 tbr1.n54ny.ip.att.net (12.122.16.138) [MPLS: Label 32364 Exp 0] 52 msec
52 msec 52 msec

13 12.122.86.165 48 msec 48 msec 52 msec

14 12.118.100.58 60 msec 60 msec 64 msec

15 oc48-po2-0.tor-151f7-cor-2.peer1.net (216.187.115.125) 52 msec 52 msec
68 msec

16 oc48-po7-0.tor-151f-dis-1.peer1.net (216.187.114.149) 52 msec 52 msec 48
msec

17 tor-fe3-5a.ne.peer1.net (216.187.68.6) 52 msec 52 msec *

Router#

Wondering why the RTT dropped to 24 msec for that hop, I entered both
69.28.226.192 and the IP address that my customer has been complaining about
(12.129.255.4) into PingPlotter and I see that those behave very
differently. I'm now guessing that AT&T is routing back traffic sent to
12.129.255.4 in a different way (perhaps asymmetrically) than traffic sent
to 69.28.226.192, but it doesn't show up until it hits 12.122.112.22.
Perhaps it's all those 1's and 2'. :wink:

I notice that in the low RTT trace router 12.88.71.13 goes to
tbr2.sl9mo.ip.att.net (12.122.112.78), but in the high RTT trace, roouter
12.88.71.13 goes to tbr1.sl9mo.ip.att.net (12.122.112.22). Must be
something about the way AT&T gets to tbr1.sl9mo.ip.att.net (12.122.112.22).
I can't traceroute to either of those networks directly. In fact, I don't
appear to be able to traceroute to any of the 12.122.x.x or 12.129.x.x I see
in my traceroutes, perhaps because AT&T uses some of that space internally
and doesn't advertise it.

Frank

Depending on whether TTL is propagated into MPLS, this could be true.

Though it should also be pointed out that ICMP responses aren't exactly a precise scientific tool... The responding router could just be busy, and the response time could be reflective of load more than link latency etc. Similarly, failure to get any response at all from a router isn't necessarily indicative of packet loss...

Cheers,
-Benson

Just google "tbr1.sl9mo.ip.att.net" and it's clear that high latency through
that point has occurred before. And guess what kind of customer complained
to me about the latency? A gamer.

Frank

Frank Bulk - iNAME wrote:

Just google "tbr1.sl9mo.ip.att.net" and it's clear that high latency through
that point has occurred before. And guess what kind of customer complained
to me about the latency? A gamer.

you can pay a lot of money for the net propagation anomaly detection
services that gamers give you for free.

randy

The ICMP packet (TTL exceeded in transit) contains a copy of the packet which TTL expired, including the labels, so label information is available to traceroute.

Just to close this issue on the list: a (top) engineer from AT&T contacted
me offline and helped us out.

Turns out that 12.88.71.13 is located in Kansas City and
tbr1.sl9mo.ip.att.net (12.122.112.22) is in St. Louis. AT&T has two L1
connections to that site for redundancy, but traffic was flowing over the
longer loop. The engineer tweaked route weights so that the traffic prefers
to flow over the shorter link to tbr2.sl9mo.ip.att.net (12.122.112.78),
shaving about 12 msec.

He also explained that the jump of ~70 msec is due to how ICMP traffic
within MPLS tunnels is handled. It wasn't until I ran a traceroute from a
Cisco router that I even saw the MPLS labels (that included in the ICMP
responses) for each of the hops within the tunnel. Apparently each ICMP
packet within an MPLS tunnel (where TTL decrementing is allowed) is sent to
the *end* of the tunnel and back again, so my next "hop" to
tbr1.sl9mo.ip.att.net (12.122.112.22) was really showing the RTT to the end
of the tunnel, Los Angeles.

Frank

Even if they are decrementing TTL inside of their MPLS core, the TTL expired message still has to traverse the entire MPLS LSP (tunnel), so the latency reported for each "hop" is in fact the latency of the last hop in the MPLS network. Always.

Sam

Robert Richardson wrote:

Sam Stickland wrote:

Even if they are decrementing TTL inside of their MPLS core, the TTL
expired message still has to traverse the entire MPLS LSP (tunnel), so
the latency reported for each "hop" is in fact the latency of the last
hop in the MPLS network. Always.

And who said tunneling protocols aren't fun :slight_smile:

- --