Long hops on international paths

Hello,

I am a researcher at the University of Wisconsin. My colleagues at Northwestern University and I are studying international Internet connectivity and would appreciate your perspective on a recent finding.

We’re using traceroute data from CAIDA’s Ark project for our work. We’ve observed that many international links (i.e., a single hop on an end-to-end path that connects two countries where end points on the hop are identified via rDNS) tend to originate/terminate at the same routers. Said another way, we are observing a relatively small set of routers in different countries tend to have a majority of the international connections - this is especially the case for hops that terminate in the US. For example, there is a router operated by Telia (AS1299) in Chicago that has a high concentration of such links. We were a bit surprised by this finding since even though it makes sense that the set of providers is relatively small (i.e., those that offer global connectivity), we assumed that the set of routers that used for international connectivity within any one country would tend to be more widely distributed (at least with respect to how they appear in traceroute data - MPLS notwithstanding).

We’re interested in whether or not this is indeed standard practice and if so, the cost/benefit for configuring international connectivity in this way?

Any thoughts or insights you might have would be greatly appreciated - off-list responses are welcome.

Thank you.

Regards, PB

Paul Barford
University of Wisconsin - Madison

Is ttl decrement disabled on the test paths you're measuring?

Broadly speaking, if you have a point-to-point link from one location to another (or parallel set of links with a common failure path, e.g. waves on a specific fibre path), there's a single router at each end.

Nick

Hello Nick,

I’ve added my collaborators to this reply - Esteban can comment on your observation re. Telia.

TTL is​ decremented on the paths we’re analyzing. What we’re curious about is why we’re seeing a concentration of hops at a small number of routers that appear on international paths. I expected that when we looked at paths between e.g., US-Asia, US-Europe, US-South America (considering measurements from Ark nodes doing traceroutes to/from those locations), we would see the first instances of routers located the US along the west coast, east coast and south coast respectively. We did not expect to see Chicago as a first hop location in the US and are wondering e.g., if large providers do this to simplify their operations. Hopefully that makes sense. Any further thoughts are appreciated.

Regards, PB

I suggest you share a few actual examples (IP addresses, traceroutes).

I don't think discussing your conclusion based on data we don't have
makes sense.

Lukas

Hi Paul,

Just curious. How do you determine they are the same routers? Is it based on IP address or MAC addresses? Or using CAIDA’s router alias database?

Also how do you draw the conclusion that the AS1299 router is indeed in Chicago? IP-geolocation based on rDNS is not always accurate though.

Pengxiong

Carrier class core routers still cost half a million dollars each or (way) more, so it’s not uncommon for there to be 2-4 in a metro.

And there are only a few metros that have undersea cable landing stations.

We deploy a minimum of a pair of core routers everywhere, but with our BGP/OSPF/iBGP core your path through us generally won’t change even though there’s an alternative path with slightly lower route pref. (Absent loss of both physical path and physical alternate)

Ms. Lady Benjamin PD Cannon of Glencoe, ASCE
6x7 Networks & 6x7 Telecom, LLC
CEO
lb@6by7.net
"The only fully end-to-end encrypted global telecommunications company in the world.”

FCC License KJ6FJJ

Dear Pengxiong,

Thanks for your questions:

  1. We are using CAIDA’s Internet Topology Data Kit (ITDK) that uses the MIDAR alias resolution method to infer IP addresses assigned to the same router.
  2. We understand the concerns about IP geolocation. Interfaces of the router in question are assigned similar domain names e.g., “chi-b2-link.ip.twelve99.net” (62.115.50.61). We also used CAIDA’s ITDK, which provides geolocation information, and indicates that this router is located in Chicago. We cross-reference with Maxmind where possible. In this particular case, there is the telltale in the use of “chi” in the domain name.

Hope that helps.

Regards, PB

Dear Pengxiong,

Thanks for your questions:

  1. We are using CAIDA’s Internet Topology Data Kit (ITDK) that uses the MIDAR alias resolution method to infer IP addresses assigned to the same router.
  2. We understand the concerns about IP geolocation. Interfaces of the router in question are assigned similar domain names e.g., “chi-b2-link.ip.twelve99.net” (62.115.50.61). We also used CAIDA’s ITDK, which provides geolocation information, and indicates that this router is located in Chicago. We cross-reference with Maxmind where possible. In this particular case, there is the telltale in the use of “chi” in the domain name.

I think nick’s point about ttl expiry and missing some context on topology still stands.
I’d be that the paths between 2 continents do not actually land in chicago… that you’re seeing (or not seeing) missing hops between the coast(s) and chicago inside 1299’s network in the US.

What we’re considering specifically are consecutive (layer 3) hops as identified by traceroute. Thus, TTL is decremented by 1 and no more than 1 (i.e., we have to get full information (not *****) from consecutive hops to consider the link). I have asked my colleague to put together a set of examples. We assume that there are multiple layer 1 and 2 links, and possibly layer 3 hops masked from traceroute by MPLS. But what we’re seeing in terms of hops exposed by traceroute make it look like a single (TTL decremented by 1) hop.

I’ll post the examples when I get them.

PB

I guess it depends what you’re considering a “very few” number of routers but this seems to be an expected outcome. While there are a large number of wet cable landing stations, they are highly concentrated near a small number of metro areas, and with the exception of capacity owned by the ILECs, the supermajority of routers terminating that capacity in the US are going to live in fewer than ten discrete carrier hotel locations. (It’s worth noting that terrestrial capacity coming in from Mexico and Canada also terminates in a small number of locations, although the overlap between the two lists is fairly small). In addition, while the links likely terminate in multiple devices at a given location, carriers more likely to undersubscribe transoceanic core capacity than other areas of the core, which means that for many carriers it’s unlikely you’d see multiple paths show up in a trace unless you catch it during an outage situation. That said, seeing transoceanic links terminate in Chicago is likely an artifact of hops missing in a trace; although I am familiar with a couple of more niche providers that extend transoceanic capacity into non-coastal markets on optical gear in order to meet specific performance needs, this is unlikely to be seen in the network of a Tier 1 or similarly scaled network.

Dave Cohen
craetdave@gmail.com

Please find the examples for the case of Telia below.

FROM jfk-us (jfk-us.team-probing.c008820.20201002.warts.gz)

traceroute from 216.66.30.102 (Ark probe hosted in New York City, NY, US. No AS info found) to 223.114.235.32 (MAXMIXD: Turpan, CN)

1 216.66.30.101 0.365 ms

2 62.115.49.173 3.182 ms

3 *

4 62.115.137.59 17.453 ms [x] (chi-b23-link.ip.twelve99.net., CAIDA-GEOLOC → Chicago, IL, US)

5 62.115.117.48 59.921 ms [x] (sea-b2-link.ip.twelve99.net., RIPE-IPMAP → Seattle, WA, US)

6 62.115.171.221 69.993 ms

7 223.120.6.53 69.378 ms

8 223.120.12.34 226.225 ms

9 221.183.55.110 237.475 ms

10 221.183.25.201 238.697 ms

11 221.176.16.213 242.296 ms

12 221.183.36.62 352.695 ms

13 221.183.39.2 300.166 ms

14 117.191.8.118 316.270 ms

15 *

16 *

17 *

18 *

19 *

FROM ord-us (ord-us.team-probing.c008820.20201002.warts.gz)

traceroute from 140.192.218.138 (Ark probe hosted in Chicago, IL, US at Depaul University-AS20120) to 109.25.215.237 (237.215.25.109.rev.sfr.net., MAXMIXD: La Crau, FR)

1 140.192.218.129 0.795 ms

2 140.192.9.124 0.603 ms

3 64.124.44.158 1.099 ms

4 64.125.31.172 3.047 ms

5 *

6 64.125.15.65 1.895 ms [x] (zayo.telia.ter1.ord7.us.zip.zayo.com., CAIDA-GEOLOC → Chicago, IL, US)

7 62.115.118.59 99.242 ms [x] (prs-b3-link.ip.twelve99.net., CAIDA-GEOLOC → Paris, FR)

8 62.115.154.23 105.214 ms

9 77.136.10.6 119.021 ms

10 77.136.10.6 118.830 ms

11 80.118.89.202 118.690 ms

12 80.118.89.234 118.986 ms

13 109.24.108.66 119.159 ms

14 109.25.215.237 126.085 ms

traceroute from 140.192.218.138 (Ark probe hosted in Chicago, IL, US at Depaul University-AS20120) to 84.249.89.93 (dsl-tkubng12-54f959-93.dhcp.inet.fi., MAXMIXD: Turku, FI)

1 140.192.218.129 0.243 ms

2 140.192.9.124 0.326 ms

3 64.124.44.158 0.600 ms

4 *

5 *

6 64.125.15.65 1.792 ms [x] (zayo.telia.ter1.ord7.us.zip.zayo.com., CAIDA-GEOLOC → Chicago, IL, US)

7 62.115.123.27 121.199 ms [x] (hls-b4-link.ip.twelve99.net., CAIDA-GEOLOC → Helsinki, FI)

8 *

9 141.208.193.190 127.723 ms

10 84.249.89.93 139.051 ms

traceroute from 140.192.218.138 (Ark probe hosted in Chicago, IL, US) to 193.28.231.50 (MAXMIXD: None, HU)

1 140.192.218.129 0.240 ms

2 140.192.9.124 0.333 ms

3 64.124.44.158 0.648 ms

4 *

5 64.125.25.75 0.752 ms

6 64.125.15.65 1.877 ms [x] (zayo.telia.ter1.ord7.us.zip.zayo.com., CAIDA-GEOLOC → Chicago, IL, US)

7 62.115.119.39 123.952 ms [x] (bpt-b2-link.ip.twelve99.net., I suspect it is in Budapest, HU)

8 62.115.39.122 117.171 ms

9 88.151.96.148 117.202 ms

10 88.151.96.213 124.787 ms

11 *

12 *

13 *

14 *

15 *

traceroute from 140.192.218.138 (Ark probe hosted in Chicago, IL, US at Depaul University-AS20120) to 152.195.4.11 (MAXMIXD: Los Angeles, CA, US)

1 140.192.218.129 0.224 ms

2 140.192.9.124 0.545 ms

3 64.124.44.158 0.640 ms

4 *

5 *

6 64.125.15.65 1.786 ms [x] (zayo.telia.ter1.ord7.us.zip.zayo.com., CAIDA-GEOLOC → Chicago, IL, US)

7 62.115.118.247 54.597 ms [x] (las-b22-link.ip.twelve99.net., CAIDA-GEOLOC → Los Angeles, CA, US)

8 62.115.11.129 55.979 ms

9 *

10 *

11 *

12 *

13 *

traceroute from 140.192.218.138 (Ark probe hosted in Chicago, IL, US at Depaul University-AS20120) to 47.31.143.217 (MAXMIXD: Delhi, IN)

1 140.192.218.129 2.277 ms

2 140.192.9.124 0.449 ms

3 64.124.44.158 0.576 ms

4 *

5 *

6 64.125.15.65 1.814 ms [x] (zayo.telia.ter1.ord7.us.zip.zayo.com., CAIDA-GEOLOC → Chicago, IL, US)

7 62.115.114.41 210.056 ms [x] (snge-b5-link.ip.twelve99.net.,)

8 62.115.177.11 200.840 ms

9 103.198.140.16 233.636 ms

10 103.198.140.16 232.871 ms

11 103.198.140.171 232.648 ms

12 *

13 *

14 *

15 *

16 *

Looking at your 1 repeat ORD example:

6 64.125.15.65 1.895 ms [x] (zayo.telia.ter1.ord7.us.zip.zayo.com., CAIDA-GEOLOC → Chicago, IL, US)

7 62.115.118.59 99.242 ms [x] (prs-b3-link.ip.twelve99.net., CAIDA-GEOLOC → Paris, FR)

65.15.125.64.in-addr.arpa name = zayo.telia.ter1.ord7.us.zip.zayo.com.
64.15.125.64.in-addr.arpa name = ae51.zayo.ter1.ord7.us.zip.zayo.com.

it looks like the probes you selected (at least the depaul univ one(s)) are finding the ‘best path’ to whatever destinations via depaul → zayo → telia.
It looks like zayo/telia interconnect at that /31.
Based on:
https://www.teliacarrier.com/dam/jcr:fc260a69-98a2-47d3-8d30-ca7095318413/telia-carrier-map-america-nov-2021.png

i’d guess that:

  1. telia has an mpls core with no-decrement-ttl enabled
  2. the hidden hosp include NYC and possibly cleveland/wdc
  3. judging the path information purely on traceroute hops is error prone.

1) all (meaning all hitting the zayo.telia) your traceroutes originate
from University in Chicago
2) the zayo.telia device is physically close to the university
3) we should expect physically close-by backbone device to be present
in disproportionate amount of traceroutes
4) almost certainly zayo.telia is imposing the MPLS label of TTL 255,
_NOT_ copying IP TTL, therefore until MPLS label is popped, TTL is not
expiring. I.e. you are seeing ingressPE and egress PE ot Telia, you
are not seeing any P routers.

This is not esoteric knowledge, but a fairly basic Internet concept. I
am worried you are missing too much context to produce actionable
output from your work. It might be interesting to see your curriculum,
why this confusion arose, why it seems logical that the reason must be
that almost all waves are terminated there, because it would not seem
logical for people practising in the field who have even cursory
understanding, this implies problems in the curriculum.

Hello Saku,

Thank you for the summary. We’re clear about the fact that what we’re seeing are MLPS paths - that was not in question. What we are not clear about and the reason for the post is why the provider - zayo.telia in this case - would decide to configure MPLS paths between Chicago and distant international locations. We assumed we would see hops in traceroute between Chicago and coastal locations and then hops that transited submarine infrastructure followed by hops to large population centers.

Regards, PB

I think a large part of your problem is that you’re using trace route to try and determine the full topology of a large complex network. It won’t show the full topology.

Hey Paul,

Thank you for the summary. We're clear about the fact that what we're seeing are MLPS paths - that was not in question. What we are not clear about and the reason for the post is why the provider - zayo.telia in this case - would decide to configure MPLS paths between Chicago and distant international locations. We assumed we would see hops in traceroute between Chicago and coastal locations and then hops that transited submarine infrastructure followed by hops to large population centers.

MPLS is the default, not exception.

And like any other form of tunnelling, MPLS decouples underlay and overlay.

This means, the key value proposal of tunneling is that the devices
between tunnel end points do not know the original sender or final
receiver. This means, when TTL expires in transit, the P device may
not know how to return packet to sender.

There are three cases here

1) MPLS-TTL does not expire in transit => easy
2) MPLS-TTL expires in transit
  2a) generate TTL exceeded and put it back to tunnel, sending it to
egressPE, which is guaranteed to know how to return to sender
  2b) randomly assume that you know how to reach the sender and try to
send the TTL exceeded directly

with 2a) all P hops display egressPE latency, but it works. With 2b)
it might not work, as P might not know how to return. Some devices,
like Cisco, allows you heuristically to decide if to tunnel ICMP or
not, based on stack depth, but this does not work. As default table
during repair is as deep as vrf without repair, so we cannot really
discriminate.

So the best solution is to not expire in transit (the norm in
tunneling), i.e. set MPLS-TTL to 255. 2nd best is to tunnel, but the
RTT will confuse uneducated, or as it may be, hjghly educated, users.
We could implement something like
https://ytti.github.io/icmp-eo-timestamp/draft-ytti-intarea-icmp-eo-timestamp.html
to offer correct forward latencies to P/LSR in 2a scenario.

Chicago is a fairly major POP that MAY very well have waves right to other major POPs.

Can you retest from a not major POP? They’re not likely to have a wave from Indy, St. Louis, Des Moines, etc. going to Paris, Singapore, Helsinki, Budapest, etc. Then you could maybe determine if it’s a wave or MPLS.

Hello David,

Understanding the physical topology of the network is not​ our objective. What we’re trying to understand is the logical topology revealed by traceroute (we are well-aware of traceroute limitations) and why a relatively small set of routers in different countries tend to have the majority of the international connections. Our expectation was that layer 3 connectivity revealed in traceroute to be relatively evenly spread out along coasts and near submarine landing points. We’re not seeing that. So, the question is what is the cost/benefit to providers to configure/maintain routes (that include long MPLS tunnels) that tend to concentrate international connectivity at a relatively small number of routers?

Regards, PB

Paul-

You said: "... would decide to configure MPLS paths between Chicago and distant international locations ..."

AS3128 runs MPLS and it's probable someone might correct me here, but for a IGP backbone area I think it's common for there to be a full mesh of LSPs via either LDP, RSVP, SR etc. AS3128 is a small regional and we operate in that way across 60+ nodes. I don't know if it's common for someone with a global footprint like 1299 to have a contiguous global MPLS backbone, but the point of my reply was to say it's not impossible to think 1299 has a global MPLS mesh between major POPs.

-Michael

Yes, this is the common case, not an exception.