Shortest path to the world

Sean_Donelan · July 15, 2009, 12:07pm

The typical network architecture problem, what are the best (shortest latency, greatest bandwidth, etc) locations to connect to the every nation in the world? As you increase the number of locations, how do the choices change?

If you only had small (2 3 5 7 11) number of locations, where would they be?

And what data do you have to prove the choices are best?

Jeroen_Massar1 · July 15, 2009, 12:18pm

Sean Donelan wrote:

The typical network architecture problem, what are the best (shortest
latency, greatest bandwidth, etc) locations to connect to the every
nation in the world? As you increase the number of locations, how do
the choices change?

If you only had small (2 3 5 7 11) number of locations, where would they
be?

Depends completely on what the data is and why you want to send them
from A to B and if A and B are inside your network or not etc etc etc etc.

aka ETOOMANYVARIABLES.

And what data do you have to prove the choices are best?

Depends of course on what you want to 'prove'

But things that come into mind are possibly:
- Netflow/sFlow and other such data
- latency tests (simple pings from A to B to global services
that check latency, eg RIPE TTM boxes)
- Cost for circuits
- and lots lots more.

It all depends, thus also how you combine the above

Greets,
Jeroen

Bandy_Rush1 · July 15, 2009, 1:03pm

The typical network architecture problem, what are the best (shortest
latency, greatest bandwidth, etc) locations to connect to the every
nation in the world? As you increase the number of locations, how do the
choices change?

If you only had small (2 3 5 7 11) number of locations, where would they
be?

And what data do you have to prove the choices are best?

it would help if you said how you measure 'best' or 'better'.

if you had a completely free hand, what experiment would you set up to
measure this space?

randy

_Bill_Woodcock · July 15, 2009, 3:49pm

As others have noted, this is a many-variables sort of problem, and to answer it well requires nailing down a few of those variables by combining the statistical output of netflow from your border routers with knowledge of the routing tables available at each potential IXP gleaned from looking-glasses at those IXes. ( http://pch.net/routing-tables being the source of such data that I can offer; the RIPE RIS program does the same thing with a partially-overlapping and partially-unique dataset; the union of the two gives the most complete available picture.)

However, if one wanted the beginnings of an answer, without nailing down any of the specifics, merely looking at the quantity of routes available at each IXP would let you know, on average, how many paths there were on offer to each destination. In all likelihood, different paths available to a given destination will be of different lengths. The more paths available to each destination, the greater the likelihood that one path will be shorter than others or, more to the point, shorter than your current shortest.
( https://prefix.pch.net/applications/ixpdir/?show_active_only=0&sort=prefixes&order=desc or just go to http://pch.net/ixpdir and sort by prefixes. Only a router with a full mesh of peers at an IXP could actually show _all_ of the available routes at an IXP, so nearly all views of this sort will be substantially incomplete; take with a healthy dose of skepticism, and please let me know if you find more complete public sources.)

-Bill

Sean_Donelan · July 16, 2009, 2:39am

The typical network architecture problem, what are the best (shortest
latency, greatest bandwidth, etc) locations to connect to the every
nation in the world? As you increase the number of locations, how do the
choices change?

If you only had small (2 3 5 7 11) number of locations, where would they
be?

And what data do you have to prove the choices are best?

it would help if you said how you measure 'best' or 'better'.

As I said in the original message, combination of minimizing latency (smallest RTT to the most IP endpoints) and maximizing bandwidth (largest number of bits per second successfully received at the most IP endpoints in the smallest amount of time) from the locations identified as best.

Depends completely on what the data is and why you want to send them
from A to B and if A and B are inside your network or not etc etc etc etc.

As I said in the original message, every nation in the world. Or more specifically the largest number of IP endpoints reachable in the most
nations from the locations chosen.

A = the few locations you pick
B = every other IP endpoint reachable from those locations

If every point B in the world is inside your network, awesome. But highly unlikely.

More than likely to maximize reachability, minimize latency, the highest goodput, and most availability will require some combination starting locations and ISPs.

The data is IP applications in use now and in the future. Why do you want to send them from A to B, because you never know what is going to happen
in the world and you want to be prepared for any point B to have the best chance of being able to effectively communicate with the chosen points A.

However, if one wanted the beginnings of an answer, without nailing
down any of the specifics, merely looking at the quantity of routes
available at each IXP would let you know, on average, how many paths
there were on offer to each destination.

The starting locations aren't necessarily IXPs. They could be ISPs
with full transit connections at the chosen locations. But the goal
includes maximizing reachability to the world, which probably means a
full transit connection near many other ISPs would do better than a full transit connection far away from many other ISPs.

As others have noted, this is a many-variables sort of problem, and to
answer it well requires nailing down a few of those variables

True, optimization and constraints solving is easier with fewer variables.
There are also researchers that seem to spend lots of time measuring
the Internet and collecting data for all sorts of reasons. When creating
graphs of the Internet, one of the basic problems every mapper has to
solve is deciding where are the "centers" of the map.

Leo_Bicknell1 · July 16, 2009, 3:06am

In a message written on Wed, Jul 15, 2009 at 10:39:05PM -0400, Sean Donelan wrote:

As I said in the original message, every nation in the world. Or more
specifically the largest number of IP endpoints reachable in the most
nations from the locations chosen.

A = the few locations you pick
B = every other IP endpoint reachable from those locations

If every point B in the world is inside your network, awesome. But
highly unlikely.

I will assert that for all the small numbers (N < 5) the answers
are non-overlapping sets.

That is, not that these are the acual sites, N = 1 may be New York,
N = 2 may be Amsterdam and LA, N = 3 may be Hong Kong, Chicago,
Frankfurt, and so on.

More than likely to maximize reachability, minimize latency, the
highest goodput, and most availability will require some combination
starting locations and ISPs.

Reachability, latency, and goodput can not be all minimized at the
same time. There are more than one way to create a synthetic metric
combining the three, so it's quite unclear how to answer your
question.

True, optimization and constraints solving is easier with fewer variables.
There are also researchers that seem to spend lots of time measuring
the Internet and collecting data for all sorts of reasons. When creating
graphs of the Internet, one of the basic problems every mapper has to
solve is deciding where are the "centers" of the map.

95% of the mapping efforts look only at reachability, and then from
incomplete data. Another 4% look at it from latency. 1% look at
it from goodput. I have never seen a data set that related two,
much less all three in any meaningful way.

Quite frankly, your question reminds me a bit of the geography
question "where is the center of the US".

While nifty trivia, it acutally has no useful value for well,
anything. If it did, there would be more there than a small monument.

If you're going to deploy something, in addition to the criteria
you have listed you will have to consider cost and availability of
colo, transit, exchange ports, equipment, your businesses costs and
time in doing business in multiple jurisdictions, getting people
to these locations to set stuff up, cost and availability of bandwidth
between sites, management overhead of what you're going to deploy,
and so on.

One last wrench in your works. It depends on how much traffic you
want to do. If you want to move 50Mbps total, the answer is entirely
different than if you want to move 500Gbps. Goodput holds until
links fill, and which point it falls off. Plenty of video sites have
great goodput from their set of locations until a flashmob (say, Michael
Jackson) comes along and then the goodput from the same set of sites
crashes and burns.

So, you have a question that probably can't be answered, but if it could
the answer doesn't matter, and even if it did, the Internet is dynamic
so it will all be different tomorrow.

Sean_Donelan · July 16, 2009, 6:07am

Unless you were Federal Express, and wanted to understand where the "center" of your service area was to help pick better airport hub locations. Add in some offsets for time zones, weather, and even more complexity and your hub ends up in Memphis. Optimal can sometimes mean its good enough, even the momument at the center of the United States isn't actually located at the precise center.

http://ardent.mit.edu/airports/ASP_exercises/ASP%20matl%20for%20posting%202007/UPS%20and%20FedEx%20Hub%20Operations%20Cosmas%20Martini.pdf

Operations research is filled with people trying to figure out the optimal number of hubs, hub locations, routes between them for all sorts of stuff.

So where are the operations research people studying the Internet?

Michiel_Klaver1 · July 16, 2009, 8:14am

Sean Donelan wrote:

The typical network architecture problem, what are the best (shortest latency, greatest bandwidth, etc) locations to connect to the every nation in the world? As you increase the number of locations, how do the choices change?

If you only had small (2 3 5 7 11) number of locations, where would they be?

And what data do you have to prove the choices are best?

Just a quick wikipedia and google search would provide you the answers
to that:

etc...

have fun with all that data!

Kind regards,

Michiel Klaver
IT Professional

Martin_Hannigan9 · July 16, 2009, 1:00pm

it's possibly useful to take into consideration _overall population since
broadband penetration is likely to grow in a population vs. remain stagnant
or decrease. That may suggest that the largest submarine cable landing
points agggregators (Telehouse, 111 8th, etc. NOTA MIA) would be optimal for
shortest reach to multitudes of networks and large amounts of capacity and
give you "reach" as well as decent performance.

My picks were NOTA facing the Americans, 118th/60 Hudson US, and Telehouse
London for Europe. I'm not suggesting that an IX is required. Would be nice
to keep costs down if that's also part of the objective, but not required.
There's a project that is mapping datacenters onto Google Earth globally and
if I could recall the URL I would suggest that a visualization of these
answers may be interesting.

Best Regards,

Martin

Leo_Bicknell1 · July 16, 2009, 1:58pm

In a message written on Thu, Jul 16, 2009 at 02:07:12AM -0400, Sean Donelan wrote:

Unless you were Federal Express, and wanted to understand where the
"center" of your service area was to help pick better airport hub
locations. Add in some offsets for time zones, weather, and even more
complexity and your hub ends up in Memphis. Optimal can sometimes mean
its good enough, even the momument at the center of the United States
isn't actually located at the precise center.

The center of FedEx's world has nothing to do with geography, it
has to do with flight times. JFK's prennial 1 hour delays make
that flight an hour longer, even though it is no further away.
Also, if I had 20 flights to the east coast, and 1 flight to the
west coast, I may well "shift my center" east choosing to burn more
fuel and time on one flight to save fuel on 20. Oh yeah, and then
there are the other hubs in Indianapolis, Fort Worth, Oakland,
Newark, Anchorage, Paris, Guangzhou, Toronto and Miami. Guess
Memphis isn't the best, all by itself.

Anchorage you might say? That's odd. Well, turns out a fully
loaded freight aircraft have trouble making it from many Asian
countries to the US on one tank of fuel. If you have to stop to
refuel you might as well sort some packages while your waiting for
it to pump into the plane.

Operations research is filled with people trying to figure out the optimal
number of hubs, hub locations, routes between them for all sorts of stuff.

So where are the operations research people studying the Internet?

At every ISP and content provider out there. The answer is different
for every company. FedEx and UPS don't have the same hubs, because
they don't serve the same customer base. Akamai, NTT, and DTAG all
have different points of presense based on their customer bases.
Each one has the "optimal" network for their customer base.

Your question is akin to tell me the best car, house, boat, airline,
ISP, operating system. Magazines love to crown the king, but we
all know making the right choice has orders of magnitude more to
do with your specific situation than it does with the product or
service in the abstract.

Valdis_Kletnieks · July 16, 2009, 10:08pm

Given that it's Sean asking, I have to conclude he's either dropping a very
interesting thought experiment on us, or he's just trolled us, with a long list
of well-known names replying. Quite possibly both at once.

Well played, Sean.