Vulnerbilities of Interconnection

Hi,

batz wrote:

:would be difficult to reach. I'd have to run a model to be sure, but
:every one of the major seven have rerouting methodologies that would
:recover from the loss. And I don't think they exclusively peer at

Even if we were to model it, the best data we could get for
the "Internet" would be BGP routing tables. These are also
subjectve views of the rest of the net. We could take a full
table, map all the ASN adjacencies, and then pick arbtrary
ASN's to "fail", then see who is still connected, but we are
still dealing with connectivity relatve to us and our peers,
even 5+ AS-hops away.

I want to make sure I understand this. As I understand it, this would
work regarding routing only. It would be a model that would have a
result of ones and zeros, so to speak, meaning either you're connected
or you're not. What this doesn't take into consideration, I believe, is
the effects of congestion regarding increased traffic due to news
traffic and rerouting that takes place whenever there is a loss of a
site.

I would imagine this is one of the tasks CAIDA.org is probably
working on, as it seems to fall within their mission.

So even if we all agreed upon a common disaster to hypothesize
on, there would be little common ground to be had, as our
interpretations could only be political arguments over what is
most important, because there is no technically objective view
of the network to forge agreement on.

I totally agree. I think what I envision as not a huge impact would be
devastating to others. That's mostly because I'm looking at it globally,
like, if you take all routes as the denominator, and the lost routes as
the numerator, four colo sites, even the big ones, wouldn't be *that*
much effect. Proportionally. At first.

Of course, if you're a smallish ISP operator and your one peering site
happens to be at one of the four sites, you're done.

Jane

"Jane" == Pawlukiewicz Jane <pawlukiewicz_jane@bah.com> writes:

    >> Even if we were to model it, the best data we could get for
    >> the "Internet" would be BGP routing tables. These are also
    >> subjectve views of the rest of the net. We could take a full
    >> table, map all the ASN adjacencies, and then pick arbtrary
    >> ASN's to "fail", then see who is still connected, but we are
    >> still dealing with connectivity relatve to us and our peers,
    >> even 5+ AS-hops away.

    > I want to make sure I understand this. As I understand it,
    > this would work regarding routing only. It would be a model
    > that would have a result of ones and zeros, so to speak,
    > meaning either you're connected or you're not. What this
    > doesn't take into consideration, I believe, is the effects
    > of congestion regarding increased traffic due to news
    > traffic and rerouting that takes place whenever there is a
    > loss of a site.

I believe you are correct. Modelling the connectivity matrix [1] is
good as a first approximation. The next thing to do would be to
estimate the transition probabilities between ASi and ASj (you could
do this by looking at the adjacencies one step out [2], for
example. There are other methods of estimating the transition
probabilities but most are foiled by a lack of available data. You can
get a pretty good adjacency map doing table dumps from all of the
route servers, looking glasses, etc.)

Once you have the TP matrix, construct a vector of initial conditions
to represent likely traffic sources -- i.e. the ASs containing CNN and
the BBC, for example -- and look at how the traffic dissipates through
an n-step Markov process [3].

This will tell you something about how heavily loaded with traffic
certain ASs (the accumulation points) become, at least as a ratio to
"normal", but since we have no information about channel capacity
available within each AS, it doesn't say much about actual
congestion. It will, however, suggest where congestion is likely to
occur if links have not been overprovisioned by some ratio. I think :wink:

The trick is in estimating the transition probabilities. I'm not sure
this is a good method. Using adjacencies from one hop out assumes
transit to two hops out. Using n hops out implies transit to n+1 hops
and the bigger n gets the less accurately it will start representing
the real mesh since it starts implicitly assuming symmetric transit
everywhere once n is greater than, say, 4 or 5 or whatever the average
AS path length is these days.

-w

[1] C_ij = 1 if i and j connected or i == j, 0 otherwise
[2] If A_i is the number of adjacencies for ASi, then set transition
probabilities P_ij proportional to (C_ij * A_j) / Sum_k (C_jk * A_k),
normalized so that Sum_j P_ij = 1.
[3] n-th step transition probabilities are (P_ij)^n

Coming up with the as interconnection data is actaully fairly
easy if you parse route-views data. This obviously doesn't cover
every possible interconnection that exists but it does provide
a large swath of data to review for the interconnection
postulation.

  Looking at that data, (this is an old snapshot) the top ten
networks are: (in #10->#1 order)

conn ASN

alex@yuriev.com wrote:

Lets bring this discussion to a some common ground -

What kind of implact on the global internet would we see should we observe
nearly simultaneous detonation of 500 kilogramms of high explosives at N of the
major known interconnect facilities?

OK, what if 60 Hudson, 25 Broadway, LinX and AmsIX were all put out of
commission?

What about the major sites terminating undersea cables in an effort to
isolate the US?

Or major satellite uplink points?

Or all of them?

Okay,

If we're going to go off the deep end here, how about the effect of a
small yield air burst over $importantplace? Not designed to maximize
casualties/damage but rather EMP? A large number of senior military
officials got that 'deer-in-the-headlights' look a few decades back when
a deserter supplied "Soviet state of the art" fighter turned out to have
tube based electronics. :slight_smile:

It's not much of a stretch from crashing civilian airliners into high
rises to "firing for effect" with nuclear weapons. Look at what's going
on with Iraq right now.

I know, but you're saying that's why the Internet was invented, to
provide diverse communications even in a nuclear war. The Internet and
its electronics and equipment was a much different animal when that flag
was first run up the pole. I wonder if anyone has checked to see if
anyone would salute today.

Oh, wait, that's what this whole discussion is about, isn't it. :wink:

Best regards,

*********** REPLY SEPARATOR ***********

Okay,

If we're going to go off the deep end here, how about the effect of a
small yield air burst over $importantplace? Not designed to maximize
casualties/damage but rather EMP? A large number of senior military
officials got that 'deer-in-the-headlights' look a few decades back

when

a deserter supplied "Soviet state of the art" fighter turned out to

have

tube based electronics. :slight_smile:

Said tube electronics were apparently more survivable against EMP
effects. Or was that the point you were making? I think the real
surprise was a toggle switch that Belenko said was supposed to be
flipped only when told over the radio by higher headquarters. It
changed the characteristics of the radar.... sort of a "go to war" mode
vs. the standard training mode.

An interesting, if not totally professional evaluation of something
like this is in Steven Coonts book "America" where terrorists take over
an American nuclear submarine armed with a new type of Tomahawk warhead
- an EMP warhead. One of the early targets is AOL HQ in Reston, VA., (I
almost cheered).

Coonts has an inflated idea of what an outage there would do the the
internet... but there is a lot of other stuff fairly nearby, isn't
there?

Said tube electronics were apparently more survivable against EMP
effects. Or was that the point you were making? I think the real
surprise was a toggle switch that Belenko said was supposed to be
flipped only when told over the radio by higher headquarters. It
changed the characteristics of the radar.... sort of a "go to war" mode
vs. the standard training mode.

  I wouldn't be too surprised. The Patriot has a clock problem, and can't be left turned on for an extended period of time. There are plenty of military systems everywhere in the world that have various operational issues that may not materially reduce their effectiveness in their official role, but which may make them less suitable for other roles.

An interesting, if not totally professional evaluation of something
like this is in Steven Coonts book "America" where terrorists take over
an American nuclear submarine armed with a new type of Tomahawk warhead
- an EMP warhead. One of the early targets is AOL HQ in Reston, VA., (I
almost cheered).

  These things exist. I would be more concerned about drive-by attacks with HERF (High Energy Radio Frequency) guns, capable of generating an EMP field that can wipe out RAM on any computer device that is not suitably protected (Tempest shielding or being in a SCIF?). These things can be made relatively portable and undetectable until such time as they are turned on -- unlike nuclear devices that can be detected by Geiger counters, etc.... A drive-by with a van would be a lot easier to organize than hi-jacking a nuclear-equipped submarine.

  BTW, AOL headquarters is in Sterling, not Reston. It's not that far away, so I can understand why people not from that area would not be aware of the difference.

Coonts has an inflated idea of what an outage there would do the the
internet... but there is a lot of other stuff fairly nearby, isn't
there?

  What do you mean by "nearby"? Do you count the "TerraPOP"? Do you count Langley?

*********** REPLY SEPARATOR ***********

Said tube electronics were apparently more survivable against EMP
effects. Or was that the point you were making? I think the real
surprise was a toggle switch that Belenko said was supposed to be
flipped only when told over the radio by higher headquarters. It
changed the characteristics of the radar.... sort of a "go to war"

mode

vs. the standard training mode.

I wouldn't be too surprised. The Patriot has a clock problem,
and can't be left turned on for an extended period of time. There
are plenty of military systems everywhere in the world that have
various operational issues that may not materially reduce their
effectiveness in their official role, but which may make them less
suitable for other roles.

Actually I suspect it was an anti-jamming feature. Think about it....
the jammers would all be programmed based on the training mode, which
presumably we would have heard before. All off the sudden this thing is
broadcasting an entirely new signal...

<snip>

Coonts has an inflated idea of what an outage there would do the

the

internet... but there is a lot of other stuff fairly nearby, isn't
there?

What do you mean by "nearby"? Do you count the "TerraPOP"? Do
you count Langley?

I thought that MAE-East was somewhere around there? I know that there
is a fair amount of high-tech in that particular area. I don't know how
far away Langley itself is.... another target was basically "The Mall"
where it took out a couple of fly-by-wire Airbuses. Interesting book
from a techno-thriller standpoint. Just don't confuse it with
reality.<G>

Coonts has an inflated idea of what an outage there would do the the
internet... but there is a lot of other stuff fairly nearby, isn't
there?

*You* know that a hit on 60 Hudson would probably be worse (especially
considering all the OTHER stuff that would be in blast range). *I* know that.
The rest of the NANOG readership knows that.

However, the organization based in Reston probably has on the order of
1,500 times more subscribers than the NANOG list does... :wink:

...most of us have as our claim to fame the ability to talk to
inanimate objects and convince them they want to listen to us.
    -- Valdis Kletnieks in a.s.r

Wow - I'm famous. :wink:

> Coonts has an inflated idea of what an outage there would do the the
> internet... but there is a lot of other stuff fairly nearby, isn't
> there?

*You* know that a hit on 60 Hudson would probably be worse (especially
considering all the OTHER stuff that would be in blast range). *I* know that.
The rest of the NANOG readership knows that.

We had examples of that on Sep 11th and it wasnt -that- bad...

However, the organization based in Reston probably has on the order of
1,500 times more subscribers than the NANOG list does... :wink:

> ...most of us have as our claim to fame the ability to talk to
> inanimate objects and convince them they want to listen to us.
> -- Valdis Kletnieks in a.s.r

Wow - I'm famous. :wink:

famous or infamous? :slight_smile:

Steve

Actually damage to the "net" could be done with relative ease.

If you wanted to do some planning and a little staging work you could
affect large amounts of traffic.

Given recent press about large carriers moving their interconnects to
a well known IX type company, all you would have to do is place
some 7206VXR's (VXR == Very eXplosive Routers) in these co-lo's.
Or servers, 5U server.... Nice and big.

I wonder how much damage a couple of slots in a router full of Semtex
would do. Then do that in multiple co-lo and use a IP packet as
a trigger to pop them all.....

Don't forget the spare parts depots as well.

PS: All the money people spend on physical stuff to keep those
on the outside out, would only help over pressure and other things
on the inside.

The issue is that Free Market places are going to do only as much as
is needed to turn a profit, and not a penny more.

This isn't the 60's or the 70's when ATT ran the infrastructure
and had bunkers around the nation....

To some extent - nothing for the above...if design right. The major networks should have designed their networks to route around this. If not - they have done a poor job. For others, the exchange points should be a way merely to off-load their transit connections.

However - there is a point in what you are saying, from a national point of view - the exchange points should independently take care of traffic in the case a nation is isolated. But I don't think any of the above are designed for that in the first place...

- kurtis -

Yet, it is reasonable that people expect x % of their traffic to
use IX's. If those IX"s are gone then they will need to find another
path, and may need to upgrade alternate paths.

I guess the question is.

At what point does one build redundancy into the network.

I suspect its a balancing act between reducancy, survival (network)
and costs vs revenues.

not sure I'd call it a "poor job" for not planning all possible
failure modes, or for not having links in place for them.

In 1982 AT&T was still a monopoly, could spend whatever it took and the
primary threat was missles from the Soviet Union. AT&T had ten Class 1
Regional Centers in the country. Regional Centers were the "top" of the
telephone network routing hierarchy fully connected to other regional
centers.

http://www.rand.org/publications/RM/RM3097/RM3097.appb.html

I don't know how AT&T came to the conclusion that 10 was the perfect
compromise between cost, reliability and survivability. They had
lots of smart people who knew networks working on the problem, so
I'm assuming they had a decent justification to back up the choice.

Yet, it is reasonable that people expect x % of their traffic to
use IX's. If those IX"s are gone then they will need to find another
path, and may need to upgrade alternate paths.

I guess the question is.

At what point does one build redundancy into the network.

No, it doesnt necessarily use IX's, in the event of there being no peered path
across an IX traffic will flow from the originator to their upstream
"tier1" over a private transit link, then that "tier1" will peer with the
destination's upstream "tier1" over a private fat pipe then that will go to the
destination via their transit private link.

I'm only aware of a few providers who transit across IX's and I think the
consensus is that its a bad thing so it tends to be just small people for whom
the cost of the private link is relatively high.

I suspect the catch would be that in the event of major switching nodes being
taken out there would be considerable congestion on the transit links and most
likely on the private peering of the tier1's also.

I suspect its a balancing act between reducancy, survival (network)
and costs vs revenues.

I imagine in todays capitalist world its not so much balanced as weighted
heavily in economics and how best to not spend the cash!

not sure I'd call it a "poor job" for not planning all possible
failure modes, or for not having links in place for them.

Well the trouble is in the real world we cant have the budgets we'd like to
implement our plans and end up compromising.. theres the catch.

Note however that the email below is a mix of IX's and data centres, and the two
are not the same. Here we are discussing IX's.

I think its a different matter if we lose a data centre as you then risk losing
the aforementioned private transit/peer links which will probably go through
that location. Then you'd see more disruption.

With that in mind consider last years outage at 60 Hudson.. the main areas it
affected was switching IP/calls in New York (but that was hosed anyway) and
probably the next area was Europe with lots of the East Coast cable landings
going through their, I know most people I spoke to were seeing congestion and
outages going to US locations. But hey, things still worked!

Steve

> At what point does one build redundancy into the network.

No, it doesnt necessarily use IX's, in the event of there being no peered path
across an IX traffic will flow from the originator to their upstream
"tier1" over a private transit link, then that "tier1" will peer with the
destination's upstream "tier1" over a private fat pipe then that will go to the
destination via their transit private link.

But will these links have enough spare capacity so congestion doesn't
happen?

I'm only aware of a few providers who transit across IX's and I think the
consensus is that its a bad thing so it tends to be just small people for whom
the cost of the private link is relatively high.

I apologize in advance for naming names here, but I think it is important
for making my point.

A while back (I think last year, but I'm not sure) the AMS-IX had a huge
outage because the power failed in two of the main locations. One of the
locations didn't at that time have battery or generator backed up power
(although they used three diversely routed inputs from the power company)
and the other location only had batteries, which didn't last long.

Nearly everything was still reachable over transit rather than peering
with only minor congestion. However, some networks got their transit in
the same buildings as where they connect to the AMS-IX, so both their
peering and transit was gone and they were unreachable. If you think this
was only true for small networks: think again. Surfnet suffered the same
problem. Surfnet one of the largest (if not _the_ largest) Dutch network,
connecting all the universities in the country at multi-gigabit speeds.
However, they only connected to other networks in a single building at
that time. I don't know if this is still the case.

Now this is only one big network and a few small ones that suffered.
However, things could have been much worse for people in the rest of the
Netherlands, because even with all the rerouting going on almost all
traffic still flowed through Amsterdam. So any outage in Amsterdam that
takes down more than a single building would cripple the majority of Dutch
networks. Obviously, something like this doesn't happen all the time, but
luck has a tendency to run out from time to time. A plane crash (a 747
went down in an Amsterdam suburb 10 years ago) or a good sized flood (lots
of stuff is below sea level in NL) will do it.

I suspect the catch would be that in the event of major switching nodes being
taken out there would be considerable congestion on the transit links and most
likely on the private peering of the tier1's also.

I'm more worried about long distance fiber running through rural areas.
Much more bang for your backhoe renting buck.

> not sure I'd call it a "poor job" for not planning all possible
> failure modes, or for not having links in place for them.

Well the trouble is in the real world we cant have the budgets we'd like to
implement our plans and end up compromising.. theres the catch.

I don't think it's just a matter of money. In 1999, I helped roll out a
completely new network. EVERYTHING in it, except the ports customers
connect to, had a backup. Management originally wanted to connect every
location to at least three others. (We got this requirement dropped
because it essentially means you're buying a third circuit that doesn't do
anything useful until the two others are down; traffic engineering to for
both regular operation and the different failure modes is too complex.)
Still, I couldn't convince them to move the second transit connection to
another city where both our network and the transit network were also
present in the same building.

A year or so after I left I was in the building where that entire network
connects to its transit network over two independent routers at both ends
and the power went down and they couldn't get the generators online...
Eventually the utility power came back online before the batteries were
empty. All of this is on the ground floor in a place that's below sea
level only a block or so from a river.

>
> Yet, it is reasonable that people expect x % of their traffic to
> use IX's. If those IX"s are gone then they will need to find another
> path, and may need to upgrade alternate paths.
>
> I guess the question is.
>
> At what point does one build redundancy into the network.

No, it doesnt necessarily use IX's, in the event of there being no peered path
across an IX traffic will flow from the originator to their upstream
"tier1" over a private transit link, then that "tier1" will peer with the
destination's upstream "tier1" over a private fat pipe then that will go to the
destination via their transit private link.

I'm only aware of a few providers who transit across IX's and I think the
consensus is that its a bad thing so it tends to be just small people for whom
the cost of the private link is relatively high.

I think you are missing a one critical point - IX in this case is not an
exchange. It is a point where lots of providers have lots of gear in a
highly congested area. However they connect to each other in that area does
not matter.

Now presume those areas are gone (as in compeletely gone). What is the
possible impact?

Alex

They're all completely gone? Then we have a bigger issue than the
Internet not working, because lots of us are dead. A lot of the
exchange areas are city-wide, in a literal sense. Take DC, for
example. Lots of folks connect in DC, not just at MAE-East, but also
via direct cross-connects between providers, following a large variety
of fiber paths owned by a variety of carriers. A single event that
removed all the connectivity from DC would either have to devastate
the city and surrounding suburbs, or at a minimum, distrupt
electronics (EMP airburst) or hit every power plant in the area (and
yeah, that kills folks, too, especially in winter.)

Now, having destroyed civilization in DC (so to speak), we have
removed a major exchange point, but also all traffic generated in DC.
The rest of the Internet is fine. To break the rest of the exchanges,
we'd have to do the same to New York, Dallas, Boston, Chicago,
Atlanta, San Francisco, San Jose... And that's just in the States.

If you were to hit a telco hotel (usually a hard target, but we'll
grant you the necessary firepower), you would inconvenience the
Internet in that area until another well-connected site could be
chosen and filled with equipment. Internet infrastructure is
logically mapped to telco infrastructure, and telco infrastructure is
ubiquitous. You're looking for a weakness where it isn't. If you
wanted to hurt the Internet, you wouldn't hit a city. You'd hit the
cross country fiber paths, out in the middle of nowhere.

-Dave

> > At what point does one build redundancy into the network.

> No, it doesnt necessarily use IX's, in the event of there being no peered path
> across an IX traffic will flow from the originator to their upstream
> "tier1" over a private transit link, then that "tier1" will peer with the
> destination's upstream "tier1" over a private fat pipe then that will go to the
> destination via their transit private link.

But will these links have enough spare capacity so congestion doesn't
happen?

Well the policy among major isps tends to be around 50% max utilisation per
circuit so they should have capacity to reroute. you're most likely to hit
issues on the local isp's transit connection which is unlikely to have the
capacity to shift a large amount of their peered traffic onto altho medium isps
can probably reroute to another IXP a large amount anyway..

> I'm only aware of a few providers who transit across IX's and I think the
> consensus is that its a bad thing so it tends to be just small people for whom
> the cost of the private link is relatively high.

I apologize in advance for naming names here, but I think it is important
for making my point.

A while back (I think last year, but I'm not sure) the AMS-IX had a huge
outage because the power failed in two of the main locations. One of the
locations didn't at that time have battery or generator backed up power
(although they used three diversely routed inputs from the power company)
and the other location only had batteries, which didn't last long.

Nearly everything was still reachable over transit rather than peering
with only minor congestion. However, some networks got their transit in
the same buildings as where they connect to the AMS-IX, so both their
peering and transit was gone and they were unreachable. If you think this
was only true for small networks: think again. Surfnet suffered the same
problem. Surfnet one of the largest (if not _the_ largest) Dutch network,
connecting all the universities in the country at multi-gigabit speeds.
However, they only connected to other networks in a single building at
that time. I don't know if this is still the case.

Yes, there is a large amount of that happening in London where I'm more familiar
with individual ISP's networks.. they tend to exist in one or two locations and
pass traffic through a single location because of economies on bandwidth
scaling. Altho I dont know of any medium/large ones like that..

I personally have always maintained multiple sites with sufficient capacity to
handle the failure of another site since day one however perhaps I was lucky
enough to be able to draw on a company with enough cash to be willing to do
that. I regularly (every month or two) see something major happen at a site and
on the whole things continue working just fine around it!

Steve

I suspect its a balancing act between reducancy, survival (network)
and costs vs revenues.

not sure I'd call it a "poor job" for not planning all possible
failure modes, or for not having links in place for them.

It depends on your perspective and what you expect from the net and what
you see it doing for you in the future. As we move more advanced services
to the net, we will also expect much more from it - also in terms of
crisis. Just like the net was one of the prime sources of information
during 9-11. In the event of a emergency, I would very much like to be as
able to reach my bank via the net as walking into their offices.

I do agree that it is a balance, but I am not so sure that everyone have
realised this. I am not even sure that all the carriers that you would
expect to have this planning have it...

- kurtis -