IXP

Stephen_Stuart2 · April 24, 2009, 5:06pm

We got to go through all the badness that was the ATM NAPs (AADS,
PacBell NAP, MAE-WEST ATM).

I think exactly for the reason Leo mentions they failed. That is, it
didn't even require people to figure out all the technical reasons they
were bad (many), they were fundamentally doomed due to increasing the
difficulty of peering which translated to an economic scaling problem.

i.e. if you make it hard for people to peer then you end up with less
peers and shared vlan exchanges based on things like ethernet outcompete
you.

Been there done that.

We've already experienced the result of secure ID cards and the
PeerMaker tool. It was like pulling teeth to get sessions setup, and
most peers plus the exchange operator didn't believe in oversubscription
(can you say CBR? I knew you could), so you end up with 2 year old
bandwidth allocations cast in stone because it was such a pain to get
the peer to set it up in the first place, and to increase bandwidth to
you means your peer has to reduce the bandwidth they allocated to
somebody else.

I, too, had a SecureID card, whose PIN I promptly forgot. I actually
feel sorry for the poor software developers of that system; who knows
what "requirements" were imposed on them by management fiat versus
researched from the customer (and potential customer) base?

Ethernet != shared VLAN, as I'm sure you know, so equating the two is
non-sequitur. Ethernet has grown enough features that it can be used
effectively in a variety of ways - and knowing which features to
avoid is just as important as knowing which features to expose. "Not
every knob that can be turned, should be turned."

The challenge to a developer of the software infrastructure of a
modern IXP is to take what we learned about the ease of use of shared
VLAN peering and translate it into the world of pseudo-wire
interconnect. Does it have to be as hard as PeerMaker? Clearly not. If
someone is going to jump into that space, though, there's a lot of
homework to do to research what a provisioning system would need to do
to present as little a barrier to peering as possible.

Your argument, and Leo's, is fundamentally the complacency argument
that I pointed out earlier. You're content with how things are,
despite the failure modes, and despite inefficiencies that the IXP
operator is forced to have in *their* business model because of your
complacency.

Leo_Bicknell1 · April 24, 2009, 5:46pm

In a message written on Fri, Apr 24, 2009 at 05:06:15PM +0000, Stephen Stuart wrote:

Your argument, and Leo's, is fundamentally the complacency argument
that I pointed out earlier. You're content with how things are,
despite the failure modes, and despite inefficiencies that the IXP
operator is forced to have in *their* business model because of your
complacency.

I do not think that is my argument.

I have looked at the failure modes and the cost of fixing them and
decided that it is cheaper and easier to deal with the failure modes
than it is to deal with the fix.

Quite frankly, I think the failure modes have been grossly overblown.
The number of incidents of shared network badness that have caused
problems are actually few and far between. I can't attribute any
down-time to shared-network badness at exchanges (note, colos are
a different story) in a good 5-7 years.

On the contrary, I can attribute downtime already to paranoia about
it. When I had an ethernet interface fail at a colo provider to
remain nameless I was forced to call the noc, have them put the
port in a "quarantine" vlan, watch it with tcpdump for a hour, and
then return it to service. Total additional downtime after the bad
interface was replaced, 2 hours. I have no idea how watching an
interface in a vlan with tcpdump supposedly protects a shared
network.

Remember the 7513's, where adding or removing a dot1q subinterface
might bounce the entire trunk? I know of several providers to this
day that won't add/remove subinterfaces during the day, but turning
up BGP sessions on shared lans can be done all day long.

The scheme proposed with private vlan's to every provider adds a
significant amount of engineering time, documentation, and general
effort to public peering. Public peering barely makes economic
sense when its cost is as close to free as we can get it, virtually
any increase makes it useless. We've already seen many major
networks drop public peering all together because the internal time
and effort to deal with small peers is not worth the benefit.

Important volumes of traffic will be carried outside of a shared
switch. The colo provider cannot provision a switching platform
at a cost effective rate to handle all cross connects. So in the
world of PNI's, the public switch, and shared segment already select
for small players. You may want to peer with them because you think
it's fair and good, you may do it to qualify up and comers for
PNI's, but you're not /public peering/ for profit in 99% of the
cases.

All this is not to say private VLAN's aren't a service that could be
offered. There may be a niche for particular size networks with
particular sized flows to use them for good purposes. Colo providers
should look at providing the service.

A replacement for a shared, multi-access peering LAN? No. No. No.

Nick_Hilliard3 · April 24, 2009, 7:39pm

Leo, your position is: "worse is better". I happen to agree with this sentiment for a variety of reasons. Stephen Stuart disagrees - for a number of other carefully considered and well-thought-out reasons.

Richard Gabriel's essay on "worse is better" as it applied to Lisp is worth reading in this context. The ideas he presents are relevant well beyond the article's intended scope and are applicable to the shared l2 domain vs PI interconnection argument (within reasonable bounds).

Nick

Paul_WALL · April 24, 2009, 9:22pm

Wait aren't you on NYIIX and Any2? Those two alone are good for 5-7
times a year like clockwork.

Please allow me to send you a complementary copy of "The Twelve
Days of NYIIX" for your caroling collection this December:

On the first day of Christmas, NYIIX gave to me,
A BPDU from someone's spanning tree.

On the second day of Christmas, NYIIX gave to me,
Two forwarding loops,
And a BPDU from someone's spanning tree.

On the third day of Christmas, NYIIX gave to me,
        Three routing leaks,
        Two forwarding loops,
        And a BPDU from someone's spanning tree.

On the fourth day of Christmas, NYIIX gave to me,
        Four Foundry crashes,
        Three routing leaks,
        Two forwarding loops,
        And a BPDU from someone's spanning tree.

On the fifth day of Christmas, NYIIX gave to me,
        Five flapping sessions,
        Four Foundry crashes,
        Three routing leaks,
        Two forwarding loops,
        And a BPDU from someone's spanning tree.

On the sixth day of Christmas, NYIIX gave to me,
        Six maintenances notices,
        Five flapping sessions,
        Four Foundry crashes,
        Three routing leaks,
        Two forwarding loops,
        And a BPDU from someone's spanning tree.

On the seventh day of Christmas, NYIIX gave to me,
        Seven broadcast floods,
        Six maintenances notices,
        Five flapping sessions,
        Four Foundry crashes,
        Three routing leaks,
        Two forwarding loops,
        And a BPDU from someone's spanning tree.

On the eighth day of Christmas, NYIIX gave to me,
        Eight defaulting peers,
        Seven broadcast floods,
        Six maintenances notices,
        Five flapping sessions,
        Four Foundry crashes,
        Three routing leaks,
        Two forwarding loops,
        And a BPDU from someone's spanning tree.

On the ninth day of Christmas, NYIIX gave to me,
        Nine CDP neighbors,
        Eight defaulting peers,
        Seven broadcast floods,
        Six maintenances notices,
        Five flapping sessions,
        Four Foundry crashes,
        Three routing leaks,
        Two forwarding loops,
        And a BPDU from someone's spanning tree.

On the tenth day of Christmas, NYIIX gave to me,
        Ten proxy ARPs,
        Nine CDP neighbors,
        Eight defaulting peers,
        Seven broadcast floods,
        Six maintenances notices,
        Five flapping sessions,
        Four Foundry crashes,
        Three routing leaks,
        Two forwarding loops,
        And a BPDU from someone's spanning tree.

On the eleventh day of Christmas, NYIIX gave to me,
        Eleven OSPF hellos,
        Ten proxy ARPs,
        Nine CDP neighbors,
        Eight defaulting peers,
        Seven broadcast floods,
        Six maintenances notices,
        Five flapping sessions,
        Four Foundry crashes,
        Three routing leaks,
        Two forwarding loops,
        And a BPDU from someone's spanning tree.

On the twelfth day of Christmas, NYIIX gave to me,
        Twelve peers in half-duplex,
        Eleven OSPF hellos,
        Ten proxy ARPs,
        Nine CDP neighbors,
        Eight defaulting peers,
        Seven broadcast floods,
        Six maintenances notices,
        Five flapping sessions,
        Four Foundry crashes,
        Three routing leaks,
        Two forwarding loops,
        And a BPDU from someone's spanning tree.

Leo_Bicknell1 · April 24, 2009, 9:41pm

In a message written on Fri, Apr 24, 2009 at 04:22:49PM -0500, Paul Wall wrote:

On the twelfth day of Christmas, NYIIX gave to me,
        Twelve peers in half-duplex,
        Eleven OSPF hellos,
        Ten proxy ARPs,
        Nine CDP neighbors,
        Eight defaulting peers,
        Seven broadcast floods,
        Six maintenances notices,
        Five flapping sessions,
        Four Foundry crashes,
        Three routing leaks,
        Two forwarding loops,
        And a BPDU from someone's spanning tree.

Let's group:

Problems that can/will occur with per-vlan peering:
          Twelve peers in half-duplex,
          Six maintenances notices,
          Five flapping sessions,
          Four Foundry crashes,
          Three routing leaks,
          Two forwarding loops,

Problems that if they affect your equipment, you're configuring it wrong,
and can/will occur with per-vlan peering:
Eleven OSPF hellos,
Nine CDP neighbors,

Problems that if they affect the exchange, the exchange is configuring
their equipment wrong, and can/will ocurr with per-vlan peering:
Two forwarding loops,
And a BPDU from someone's spanning tree.

Problems unique to a shared layer 2 network:
Eight defaulting peers,
Seven broadcast floods,

Leaving aside the particular exchanges, I'm going to guess you are not
impressed by the technical tallent operating the exchange switches from
the tone of your message. Do you believe making the configuration for
the exchange operation 100 times more complex will:
   A) Lead to more mistakes and down time.
   B) Lead to less mistakes and down time.
   C) Have no effect?

I'm going with A. I also think the downtime from A, will be an
order of magnitude more down time than the result of defaulting
peers (which, generally results in no down time, just theft of
service), or broadcast floods.