US-Asia Peering

Rearranged slightly.

What are the technical issue with extreme long distance (transoceanic)
peering?

In particular, what are the issues interconnecting layer 2 switches across
the ocean for the purposes of providing a global peering cloud
using:

In the generic sense, the issues are largely the same as
interconnecting to the L2 switch in the customer's cage (operated by
the customer as an aggregation device) or the L2 switch implementing
another exchange fabric in the same building or metro.

Complex L2 topologies are hard to manage, especially when the devices
that implement that topology are not all managed by the same entity.
L2 has poor policy knobs for implementing administrative boundaries;
if such a boundary is involved, the events that you need to anticipate
are largely the same whether the switches are separated by a cage wall
or an ocean. The auto-configuring properties of L2 fabrics (like MAC
learning, or spanning tree) that make it an attractive edge technology
can be very detrimental to smooth operation when more than one set of
hands operates the controls.

An exchange point is, quite literally, an implementation of an
administrative boundary; the desire of customers to use L2 devices
themselves (for aggregation of cheap copper interfaces toward
expensive optical ones, use of combined L2/L3 gear, or whatever other
reason) means that the L2 fabric of any Ethernet-based exchange has
potentially as many hands on the controls as there are customers.

So, good operational sense for the exchange point operator means
exercising as much control over those auto-configuring properties as
is possible; turning them off or turning automatic functions into
manual ones. Did I mention that L2 has lousy knobs for policy control?
(They're getting a little better, so perhaps whatever is a notch
better than "lousy" is appropriate).

One of the ones that you have to turn off is spanning tree (read the
thread from a few months back on the hospital network that melted down
for background material). That means that you have to build a
loop-free topology out of reliable links, which you can get with all
three of the technologies you mention, but you have to build a
loop-free topology. In order to use inter-metro connectivity
efficiently, you are limited to building L2 patches that each
implement pairwise connectivity between two metros. That makes this:

0) vanilla circuit transport to interconnect the switches

hard, because your interior connectivity is dedicated to one of those
pairwise combinations (hard, but not impossible, assuming you have
some capex to throw at the problem). The pairwise limitation also,
indirectly, puts the kibosh on using this fabric as a means to pretend
that a network has a backbone in order to qualify for peering that it
wouldn't get otherwise.

That leaves these two:

1) MPLS framed ethernet service to interconnect the switches
2) tunnelled transport over transit to interconnect the switches

which will carry the exchange point traffic over an L3 (okay, so MPLS
is "L2.5") network; in addition, you get the benefit of being able to
have all the L3 knobs available in the interior to do traffic
engineering. Both options perform better when the interior MTU can
accomodate the encapsulation protocol plus payload without
fragmenting, so someone is operating routers with SONET interfaces in
this equation.

Qui bene?

- The operator of the L3 network that carries the inter-EP fabric gets
  revenue.

- The people who peer using this L2 fabric get to avoid some transit,
  but I would argue that it is only to reach providers that are
  similarly desirous of avoiding transit, since this won't help the
  upwardly mobile with the geographic diversity component of getting
  to the "next" tier.

Who loses?

- Transit providers who came to the exchange point for the purpose of
  picking up transit sales.

- If the exchange point operator is the one carrying the traffic, they
  lose for competing with their customers in the previous bullet; they
  will have taken the first steps on the path from being an exchange
  point operator to being a network-plus-colo provider (where they'll
  compete with the network-plus-colo providers just coming out of
  bankruptcy with all their debt scraped off).

So far, there has been an assumption that the provider of inter-EP
connectivity is willing to portion it out in a manner that is
usage-insensitive for the participants. I don't believe that the glut
of capacity or the other expenses that come with operating an L3
network has driven the costs so low that the resulting product is "too
cheap to meter." If that is the case, then delivering robust,
auditable service is better implemented by connecting the customers up
to the L3 gear and implementing their L2 connections as pairwise
virtual circuits between customers (so you can be sure you're not
paying remote rates to talk to a local customer, or billing local
rates to traffic that a customer exchanges over the wide area). We
effectively just took the exchange point switch fabric out of the
equation and layered the peering fabric as VCs (MPLS or tunneled
Ethernet) on a transit network where each customer is connected on a
separately auditable interface. At that point:

- the people doing the peering might just as well use their transit,
  unless they work together (in pairwise combinations, so this doesn't
  scale well) to get on-net pricing from a common transit provider.

- the people doing the peering might as well just buy a circuit, if
  they're going so cheap, or build out their own network in hopes of
  making the grade with respect to the geographic diversity component
  of other people's peering requirements.

I understand that there is a real glut of AP transoceanic capacity,
particularly on the Japan-US cable where twice as much capacity is idle as
is in use. This has sent the price point down to historic levels, O($28K/mo
for STM-1) or less than $200/Mbps for transport! This is approaching an
attractive price point for long distance peering so, just for grins,...

Consumed at what quantities? Parceling this out at single-digit-Mb/s
quantities increases the cost at the edge of the network that delivers
this.

Are there transport providers that can provide a price point around
$100/Mbps for transport capacity from Tokyo to the U.S. (LAX/SJO) ?

SJO is in Costa Rica, bud. :slight_smile:

Stephen
VP Eng., PAIX

- Transit providers who came to the exchange point for the purpose of
  picking up transit sales.

- If the exchange point operator is the one carrying the traffic, they
  lose for competing with their customers in the previous bullet; they
  will have taken the first steps on the path from being an exchange
  point operator to being a network-plus-colo provider (where they'll
  compete with the network-plus-colo providers just coming out of
  bankruptcy with all their debt scraped off).

  i'm still amazed that nobody has brought up the fact that a couple
  of the larger colo/exchange operators that claimed they wouldn't
  compete with their IP customers are indeed selling IP transit--
  intentionally undercutting the prices of the providers that colo'd
  there to sell transit partly because the colo/exchange operator
  kept telling the world that they would never compete with their
  customers in the IP transit space.

  clearly, interconnecting their exchange points to create a richly-
  connected Internet 'core' is a natural progression if their
  customers don't complain too loudly.

  not that it's a bad long-term plan-- but I do agree with Stephen
  in that it'll be tough for them to survive against the debt-free
  big boys if they emerge as clear network-plus-colo competitors
  and lose the few remaining bits of their 'neutral' facade.

- jsb

Both Stephen and Jeff and correct.

And Im not sure it would be in the best interests of the colo company to be a jack of all trades.

Where I do see a benefit are where an exch pt company wants to start a new one in a new city. It's the classic chicken and the egg. Where I have promoted allowing a beta group of peers to jump in for little or no charge till say peer #6, another solution is to connect that new exch pt to a successful one at another location. Allowing the benefit of new peers at location B to see old peers at location A. This would allow a critical mass of peers immediately, and would allow customer 1 to see benefit.

Some restrictions might have to be in place.

1) Limiting the traffic levels for distance peering. 100meg or 1 Gig would do it
2) Perhaps a time limit

Also, instead of competing with carriers at this new location B, you would actually prove there is business there. Most companies are in a wait and see mode before deploying, or a wait and get an order 1st mode. By jump starting the peering with transport, you then have the data the carrier engineers need to justify a build.

This IS one way to get more successful peering points started.

  clearly, interconnecting their exchange points to create a richly-

    > connected Internet 'core' is a natural progression if their
    > customers don't complain too loudly.
    > not that it's a bad long-term plan...

Actually, it is. It's failed in every prior instance.

It's one of the many, many ways in which exchange points commit suicide.

                                -Bill

who is still connected to mae-w fddi?

  i know there are people there.

  time limits don't work well.

I find the interesting that there were immediate assumptions by
all the followup posters that the hypothectical mesh wbn suggested
would be run by an exchange point operator. Perhaps no public
statements were sent by anyone in using similar trans-atlantic
services (that are not run by the affected EP operator[s]). It
isn't a new solution, and there isn't only one company offering
the service.

I think exploring any technical issues/experiences in the
differing existing deploys and how they would relate to a trans-
pacific deploy is quite worthwhile. If anyone using one of the
trans-atlantic services wanted to send comments but didn't have
enough desire to get a throwaway account subscribed to nanog-post,
I'll happily anonomize and repost for you. Just no guarentees on
timliness.

Cheers,

Joe

    > clearly, interconnecting their exchange points to create a richly-
    > connected Internet 'core' is a natural progression if their
    > customers don't complain too loudly.
    > not that it's a bad long-term plan...

Actually, it is. It's failed in every prior instance.

I'd like to understand your viewpoint Bill. The LINX consists of a handful of distributed and interconnected switches such that customers are able to choose which site they want for colo. Likewise for the AMS-IX and a handful of other dominant European exchanges. By most accounts these are successful IXes, with a large and growing population of ISPs benefiting from the large and growing population. So I don't see the failure cases.

It's one of the many, many ways in which exchange points commit suicide.

I'd love to see a list of the ways IXes commit suicide. Can you rattle off a few?

The LINX consists of a handful

    > of distributed and interconnected switches such that customers are able to
    > choose which site they want for colo. Likewise for the AMS-IX and a handful
    > of other dominant European exchanges.

Correct. Within the metro area. That is, as has been documented many
times over, a necessary condition for long-term stability.

    > >It's one of the many, many ways in which exchange points commit suicide.
    >
    > I'd love to see a list of the ways IXes commit suicide. Can you rattle off
    > a few?

1) Cross the trust threshhold in the wrong direction.
2) Cross the cost-of-transit threshhold in the wrong direction.
3) Increase shared costs until conditions 1 and/or 2 are met.

Those are sort of meta-cases which encompass most of the specific failure
modes. Of course, you can always declare yourself closed or obsolete, a
al MAE-East-FDDI, which I guess would be a fourth case, but rare.

                                -Bill

    > The LINX consists of a handful
    > of distributed and interconnected switches such that customers are able to
    > choose which site they want for colo. Likewise for the AMS-IX and a handful
    > of other dominant European exchanges.

Correct. Within the metro area. That is, as has been documented many
times over, a necessary condition for long-term stability.

Theres an increasing number of "psuedo-wire" connections tho, you could regard
these L2 extensions an extension of the switch as a whole making it
international.

Where the same pseudo wire provider connects to say LINX, AMSIX, DECIX your only
a little way off having an interconnection of multiple IXs, its possible this
will occur by accident ..

Steve

Where the same pseudo wire provider connects to say LINX, AMSIX,
DECIX your only a little way off having an interconnection of
multiple IXs, its possible this will occur by accident ..

and l2 networks scale soooo well, and are so well known for being
reliable. is no one worried about storms, spanning tree bugs, ...
in a large multi-l2-exchange environment? this is not a prudent
direction.

randy

> Where the same pseudo wire provider connects to say LINX, AMSIX,
> DECIX your only a little way off having an interconnection of
> multiple IXs, its possible this will occur by accident ..

and l2 networks scale soooo well, and are so well known for being
reliable. is no one worried about storms, spanning tree bugs, ...
in a large multi-l2-exchange environment? this is not a prudent
direction.

Well, first I think we need to agree that there are two different cases here:
1) interconnecting IXes operated by the same party, vs.
2) interconnecting IXes operated by different parties.

In the first case an IX operator can shoot himself in the foot, but there is only one gun and one person, so you can easily figure out why the foot hurts. In the latter case, there are more people with more guns. Without perfect information distributed among the operators, this is clearly a more dangerous situation and diagnosing/repairing is more difficult and time intensive. I believe we are really talking about the first case.

Secondly, some of the issues of scaling l2 infrastructure have been addressed by VLANs, allowing the separation of traffic into groups of VLAN participants. This reduces the scope of an L2 problem to the VLAN in use. Since the role of the IX operator is to provide a safe stable scaleable etc. interconnection environment, distributed VLANs are a tool that can help extend the peering population while mitigating the risk of any single ISP from wrecking the peering infrastructure.

Bill

Well, first I think we need to agree that there are two different cases here:
1) interconnecting IXes operated by the same party, vs.
2) interconnecting IXes operated by different parties.

In the first case an IX operator can shoot himself in the foot, but there
is only one gun and one person, so you can easily figure out why the foot
hurts.

well, now we know you have ever had to debug a large L2 disaster

randy

> Well, first I think we need to agree that there are two different cases here:
> 1) interconnecting IXes operated by the same party, vs.
> 2) interconnecting IXes operated by different parties.
>
> In the first case an IX operator can shoot himself in the foot, but there
> is only one gun and one person, so you can easily figure out why the foot
> hurts.

well, now we know you have ever had to debug a large L2 disaster

Randy - You snipped out what I said out of context. Below is the complete paragraph (and admittedly I should have said "relatively easily" rather than "easily".) The point is that I don't think we are talking about interconnecting switches operated by different parties, and I think you would agree that if it is difficult diagnosing problems with a single large scale l2 fabric, it is even more difficult with multiple administrative domains. That was the point.

Original Paragraph:
>In the first case an IX operator can shoot himself in the foot, but there is only >one gun and one person, so you can easily figure out why the foot hurts.
>In the latter case, there are more people with more guns. Without perfect >information distributed among the operators, this is clearly a more dangerous >situation and diagnosing/repairing is more difficult and time intensive. I believe >we are really talking about the first case.

Woody - I'd still like to hear about the failures "in every prior instance".

Yes, that's an unfortunate accident that's occurred before.

                                -Bill

what a morass of confusion. on the one hand we have a metro peering fabric,
which as linx, exchangepoint, paix, and lots of others have shown, is good.
on another hand we have a metro peering fabric, which as mfs and ames showed,
can be really bad. because we have a lot of hands we also have exchange-level
peering, which as paix and six has shown, can be done safely. there's also
a hand containing multiple instances of exchange level peering which was not
done safely (and i'm not double counting ames and mfs here.) finally we have
intermetro (wide area) peering, which has been shown to be a complete joke
for peering for any number of reasons.

before any of you argue further, please carefully define your terminology so
the rest of us will know how to fill out our scorecards.