Points of Failure (was Re: National infrastructure asset)

Sean Donelan wrote:
>
> > But there was a point in time when taking out a certain parking garage
> > in Va could have caused us a very great deal of difficulty. But I'd say
> > we are past that, for the most part.
>
> Are we?
>
> When 25 Broadway failed, approximately 1% of the global Internet
> routing table also disappeared. Which I would guess qualifies it
> as a "major" hub.

But does that mean that X number of sites were unreachable, or that
there were simply Y number fewer routes to X sites? (Excluding those
*directly* affected, ie; those *in* 25 Broadway)

  From what point did 1% of the routing table disappear?
  Was the same visable from multiple, diverse points?

  I expect that from some perspectives, 100% of the routing
  table disappeared and some places didn't even see a blip.

--bill

The Internet as we know it is just a collective illusion.

You are correct from one side of the partion, 99% of the routes
disappeared and on the other side 1% of the routes disappeared.
I checked four different BGP feeds from a mix of providers, and
they were fairly consistent.

But percentage of routes is just one way to measure "importance."
It may not be the best way. Other methods include

   1. Number of stock options owned by Very Important People
   2. CAIDA skitter traces of routers of confluence
   3. Number of OC-192 links in a building
   4. Number of "Tier 1" providers in a building
   5. Government fiat
   6. Wait for the building to fall down and see what happens

Assuming there are locations more impotant than others, should
we do anything? Or should we just hope no one else figures out
where they are?

:But percentage of routes is just one way to measure "importance."
:It may not be the best way. Other methods include
:
: 1. Number of stock options owned by Very Important People
: 2. CAIDA skitter traces of routers of confluence
: 3. Number of OC-192 links in a building
: 4. Number of "Tier 1" providers in a building
: 5. Government fiat
: 6. Wait for the building to fall down and see what happens

Is there a geometric method of measuring the 'meshedness' of a
given set? If you take all the as-paths from a sampling of
peers across the Internet, and show the relative density of
where the respective paths converge, you can get a good picture
of who's transiting the most routes.

Now this doesn't show physical connections, as an AS can represent
an area that spans continents, but it shows who is responsible, which
in any DRP/BCP is among the first things established.

:Assuming there are locations more impotant than others, should
:we do anything? Or should we just hope no one else figures out
:where they are?

Well, the gov, or an industry consortium can find these
dense transit areas and require that organizations with (eg.)2 or more peers
have some semblance of a DRP/BCP plan that can be audited. The plan doesn't
nessecarily have to garuntee connectivity, but establish whether they
can be trusted to route packets if the DRP/BCP has to be initialized.

After all, we are talking about the Internet, and though many orgs control
lots of different parts of the infrastructure, maybe a plan just for layer 3
might be worth persuing.

So maybe a large percentage of traffic gets routed through 60 hudson,
the providers that are located there would each have to have diverse enough
infrastructure/routing policies to contend with the unavailability of their
equipment at that facility, to qualify as an Infrastructure Carrier.

In short, I think that a plan like this should start on ground we all know
and have the power to negotiate on, which is layer 3.

note that richer meshes may increase forwarding reliability but they
exacerbate routing convergence problems. see abha's nanog presentation.

randy

[snip]

Is there a geometric method of measuring the 'meshedness' of a
given set? If you take all the as-paths from a sampling of
peers across the Internet, and show the relative density of
where the respective paths converge, you can get a good picture
of who's transiting the most routes.

The mathematical term 'connectivity' measures the least number of
vertices that has to be destroyed to stop a network from being fully
connected.

Any network that contains a SPoF (even if it only causes one small bit
to go lost) has a connectivity of '1'. Any network that you need to
hit at least 2 vertices (routers and switches would be vertices, lines
would be edges) has a connectivity of '2'.

There are very nice mathematical methods for determining the
connectivity and connectionness of a graph (network).

I can recommend Skiena's "The algorithm design manual" for anybody
interested. It is supposedly available online in HTML (I bought the
dead tree version :slight_smile:

Greetz, Peter

&, particularly where such meshes are formed in part from multiple
providers, the probability of the types of critical errors caused by the
failure of any one the providers (as opposed to those which require
all of the providers to go down). [trivial example: most people
don't filter their upstreams at all. if you have n upstreams, then
if any one gets hacked and decides to send 100,000,000 routes
at you, you die. Probability increases with n]

Alex Bligh
Personal Capacity

Some cites have peering and co-locations diversity, some don't.

InfoMart & Westin Building come to mind. Those should rank high
by your list.