IXP

Not sure how switches handle HOL blocking with QinQ traffic across trunks,
but hey...
what's the fun of running an IXP without testing some limits?

Indeed. Those with longer memories will remember that I used to
regularly apologize at NANOG meetings for the DEC Gigaswitch/FDDI
head-of-line blocking that all Gigaswitch-based IXPs experienced when
some critical mass of OC3 backbone circuits was reached and the 100
MB/s fabric rolled over and died, offered here (again) as a cautionary
tale for those who want to test those particular limits (again).

At PAIX, when we "upgraded" to the Gigaswitch/FDDI (from a DELNI; we
loved the DELNI), I actually used a feature of the switch that you
could "black out" certain sections of the crossbar to prevent packets
arriving on one port from exiting certain others at the request of
some networks to align L2 connectivity with their peering
agreements. It was fortunate that the scaling meltdown occurred when
it did, otherwise I would have spent more software development
resources trying to turn that capability into something that was
operationally sustainable for networks to configure the visibility of
their port to only those networks with which they had peering
agreements. That software would probably have been thrown away with
the Gigaswitches had it actually been developed, and rewritten to use
something horrendous like MAC-based filtering, and if I recall
correctly the options didn't look feasible at the time - and who wants
to have to talk to a portal when doing a 2am emergency replacement of
a linecard to change registered MAC addresses, anyway?. The port-based
stuff had a chance of being operationally feasible.

The notion of a partial pseudo-wire mesh, with a self-service portal
to request/accept connections like the MAEs had for their ATM-based
fabrics, follows pretty well from that and everything that's been
learned by anyone about advancing the state of the art, and extends
well to allow an IXP to have a distributed fabric benefit from
scalable L2.5/L3 traffic management features while looking as much
like wires to the networks using the IXP.

If the gear currently deployed in IXP interconnection fabrics actually
supports the necessary features, maybe someone will be brave enough to
commit the software development resources necessary to try to make it
an operational reality. If it requires capital investment, though, I
suspect it'll be a while.

The real lesson from the last fifteen or so years, though, is that
bear skins and stone knives clearly have a long operational lifetime.

Stephen

stephen, any idea why this hasn't hit the nanog mailing list yet?
it's been hours, and things that others have sent on this thread
has appeared. is it stuck in a mail queue? --paul

re:

> Not sure how switches handle HOL blocking with QinQ traffic across trunks,
> but hey...
> what's the fun of running an IXP without testing some limits?

Indeed. Those with longer memories will remember that I used to
regularly apologize at NANOG meetings for the DEC Gigaswitch/FDDI
head-of-line blocking that all Gigaswitch-based IXPs experienced when
some critical mass of OC3 backbone circuits was reached and the 100
MB/s fabric rolled over and died, offered here (again) as a cautionary
tale for those who want to test those particular limits (again).

  Ohhh... Scary Stories! :slight_smile:

The real lesson from the last fifteen or so years, though, is that
bear skins and stone knives clearly have a long operational lifetime.

  well... while there is a certain childlike obession with
  the byzantine, rube-goldburg, lots of bells, knobs, whistles
  type machines... for solid, predictable performance, simple
  clean machines work best.

Stephen

--bill

Date: Sat, 18 Apr 2009 10:09:00 +0000
From: bmanning@vacation.karoshi.com

  ... well... while there is a certain childlike obession with the
  byzantine, rube-goldburg, lots of bells, knobs, whistles type
  machines... for solid, predictable performance, simple clean
  machines work best.

like you i long for the days when a DELNI could do this job. nobody
makes hubs anymore though. but the above text juxtaposes poorly against
the below text:

Date: Sat, 18 Apr 2009 16:35:51 +0100
From: Nick Hilliard <nick@foobar.org>

... These days, we have switches which do multicast and broadcast storm
control, unicast flood control, mac address counting, l2 and l3 acls,
dynamic arp inspection, and they can all be configured to ignore bpdus in
a variety of imaginative ways. We have arp sponges and broadcast
monitors. ...

in terms of solid and predictable i would take per-peering VLANs with IP
addresses assigned by the peers themselves, over switches that do unicast
flood control or which are configured to ignore bpdu's in imaginative ways.

but either way it's not a DELNI any more. what i see is inevitable
complexity and various different ways of layering that complexity in. the
choice of per-peering VLANs represents a minimal response to the problems
of shared IXP fabrics, with maximal impedance matching to the PNI's that
inevitably follow successful shared-port peerings.

> Date: Sat, 18 Apr 2009 10:09:00 +0000
> From: bmanning@vacation.karoshi.com
>
> ... well... while there is a certain childlike obession with the
> byzantine, rube-goldburg, lots of bells, knobs, whistles type
> machines... for solid, predictable performance, simple clean
> machines work best.

like you i long for the days when a DELNI could do this job. nobody
makes hubs anymore though. but the above text juxtaposes poorly against
the below text:

  i never said i longed for DELNI's (although there is a naive
  beauty in such things)

  i make the claim that simple, clean design and execution is best.
  even the security goofs will agree.

but either way it's not a DELNI any more. what i see is inevitable
complexity and various different ways of layering that complexity in. the
choice of per-peering VLANs represents a minimal response to the problems
of shared IXP fabrics, with maximal impedance matching to the PNI's that
inevitably follow successful shared-port peerings.

  complexity invites failure - failure in unusual and unexpected
  ways. small & simple systems are more nimble, faster and more resilient.
  complex is usually big, slow, fraught w/ little used code paths, a veritable
  nesting ground for virus, worm, half-baked truths, and poorly tested
  assumptions.

  one very good reason folks move to PNI's is that they are simpler to do.
  More cost-effective -AT THAT performance point-.

  I worry (to the extent that I worry about such things at all these days)
  that the code that drives the Internet these days is bloated, slow, and
  generally trying to become the "swiss-army-knife" application of critial
  infrastructure joy. witness BGP. more knobs/whistles than you can shake
  a stick at. the distinct lack of restraint by code developers in their
  desire to add every possible feature is argueably the primary reason the
  Internet is so riddled with security vulnerabilities.

  I'll get off my soap-box now and let you resume your observations that
  complexity as a goal in and of itself is the olny path forward. What
  a dismal world-view.

--bill

"Even"? *Especially* -- or they're not competent at doing security.

But I hadn't even thought about DELNIs in years.

    --Steve Bellovin, http://www.cs.columbia.edu/~smb

Paul Vixie wrote:

in terms of solid and predictable i would take per-peering VLANs with IP
addresses assigned by the peers themselves, over switches that do unicast
flood control or which are configured to ignore bpdu's in imaginative ways.

Simplicity only applies when it doesn't hinder security (the baseline complexity). PE/BRAS systems suffer from a subset of IXP issues with a few of their own. It amazes me how much "security" has been pushed from the PE out into switches and dslams. Enough so, that I've found many vendors that break IPv6 because of their "security" features. 1Q tagging is about the simplest model I have seen for providing the necessary isolation, mimicking PNI. For PE, it has allowed complete L3 ignorance in the L2 devices while enforcing security policies at the aggregation points. For an IXP it provides the necessary isolation and security without having an expectation of the type of L3 traffic crossing through the IXP.

It's true that 1Q tagging requires a configuration component, but I'd hesitate to call it complex. 10,000 line router configs may be long, but often in repetition due to configuration limitations rather than complex. HE's IPv6 tunnel servers are moderately more complex and have handled provisioning well in my experience.

Multicast was brought up as an issue, but it's not less efficient than if PNI had been used, and a structure could be designed to meet the needs of multicast when needed.

Jack

Date: Sat, 18 Apr 2009 13:17:11 -0400
From: "Steven M. Bellovin" <smb@cs.columbia.edu>

> i make the claim that simple, clean design and execution is
> best. even the security goofs will agree.

"Even"? *Especially* -- or they're not competent at doing security.

wouldn't a security person also know about

  ARP spoofing - Wikipedia

and know that many colo facilities now use one customer per vlan due
to this concern? (i remember florian weimer being surprised that we
didn't have such a policy on the ISC guest network.)

if we maximize for simplicity we get a DELNI. oops that's not fast
enough we need a switch not a hub and it has to go 10Gbit/sec/port.
looks like we traded away some simplicity in order to reach our goals.

er... 10G is old hat... try 100G.

  i'm not arguing for a return to smoke signals. i'm arguing that
  simplicity is often time gratuitously abandoned in favor of the
  near-term, quick buck.

  if i may paraphrase Albert, "Things should be as simple as possible,
  but no simpler"

  and ARP... well there's a dirt simple hack that the ethernet-based
  folks have never been able to shake. :slight_smile:

--bill

Paul Vixie wrote:

if we maximize for simplicity we get a DELNI. oops that's not fast
enough we need a switch not a hub and it has to go 10Gbit/sec/port.
looks like we traded away some simplicity in order to reach our goals.

Agreed.

Security + Efficiency = base complexity

1Q has great benefits in security while maintaining a reasonable base complexity compared to "1 mac per port/MAC acl + broadcast storm control + <insert common L2/3 security/performance tweaks commonly used in a flat multi-point topology>". Things grow more complex as you reach up into MPLS.

I'll show my ignorance and ask if it's possible to handle multicast on a separate shared tag and maintain security and simplicity while handling unicast on p2p tags?

Standard methods of multicast on the Internet are foreign to me, and tend to act differently than multicast feeds standardly used for video over IP in local segments (from what little I have read). Primarily, I believe there was a reliance of unicast routing by multicast, which separate L2 paths might break.

Jack

Thanks for talking about your PNIs. Let's see:

Permit Next Increase
Private Network Interface
Private Network Interconnection
Primary Network Interface

and it goes on and on . . .

I'm taking no position on the underlying argument; I'm simply stating
that simplicity is an essential element for security. I like a
philosophy I've seen attributed to Einstein: "everything should be as
simple as possible, and no simpler".

And yes, I know about ARP spoofing...

    --Steve Bellovin, http://www.cs.columbia.edu/~smb

Agreed -- and that reminds of the Dr. Who Maxim: "The more sophisticated
the technology, the more vulnerable it is to primitive attack. People often
overlook the obvious."

Also, Voltaire: "Common sense is not so common.”

- - ferg

Haven't most major vendors for years offered features in their switches which mitigate ARP-spoofing, provide per-port layer-2 isolation on a sub-VLAN basis, as well as implementing layer-3 anti-spoofing on a per-switchport basis (i.e., BCP38 on a per-switchport basis)?

I tend to believe there is almost always more than one way to solve any problem, and if you can't think of more than one way you probably don't understand the problem fully.

IXPs are a subset of the Colo problem, so there may be some issues for the colo case that IXPs can handle differently than general purpose colos.
Why use "complex" DELNIs when you could just have passive coax and a real RF broadcast medium for your IXP.

If all the IXP participants always did the right thing, you wouldn't need the IXP operator to do anything. The problem is sometimes an IXP participant does the wrong thing, and the other IXP participants want the IXP operator to do something about it which is probably why most IXP
operators use stuff more complex than a passive coax.

Other than Nick's list, are there any other things someone interested in checking IXP critical infrastructure might add to the checklist?