Is it time for an disruption analysis working group for the Internet?

Have we reached a critical mass of multi-provider disruptions to make
it possible to do something yet?

Most networked industries have some group which collects and analyzes
information about disruptions. What's interesting is how often similar
disruptions had precursor events across multiple different service
providers. For example, there have been several cases in the last few
months of root and gtld servers failing to transfer zone files. And
there have been several cases of routers not withdrawing routes after
an erroneous announcement. It is only after the major disruption occurs
does the information get shared, usually via the public news media. Once
upon a time, the IETF had a group called 'netstat,' and NANOG had presentations
about the 'State of the Internet.' Neither have appeared on the agenda
of those organizations for a variety of reasons recently.

If there was a process for providers to submit initial and final reports
about significant service disruptions; and a group to organize a regular
report of common root causes across multiple providers (not a report card
on any single provider) would any provider voluntarily participate? I'm
not thinking about a real-time shared trouble ticket system, but something
on the same scale as other industry outage reports to industry working groups.

I suspect I know the answer to that question.

Craig, Randy, Jhawk stop reading here-----------------------------------

On the other hand, suppose instead of being very hard to reach I suddenly
started returning reporters' phone calls promptly and telling them about
this great idea I have to improve the reliability of the Internet.
Eventually one will write a story about it, and maybe even get some
decent coverage. How high up do I have to shoot in order to get your
CEO's attention? Does it have to be the front page of the New York Times?

Would that change the answer to the question above?

[...]

How is this handled in other networked industries? I'm sure that the same
issues of proprietary information and public humliation exist there; how
do they deal with it?

Pete.

Not precisely the networking industry, but the airline industry has been
revising recently its procedures for crash notification, not just on its
own but with governmental pressure. I suspect we could get information from
the Air Transport Association or possibly the US National Transportation
Safety Board.

A point from aviation -- incidents such as near-misses can be reported
without fear of liabiity, because the consensus is that it's more important
to recognize potential safety problems than it is to set up opportunities
for acting against individuals or setting up opportunities for lawsuits.

In other industries, the Electric Power Research Institute would be a good
starting point, since they have responsiblity for data network architecture
in the electrical power industry. Anyone from EPRI reading NANOG?

Medicine, unfortunately, isn't the best area in general for seeing examples
of how to do things in the open. There are examples in the specialty of
public health. There is a well-respected email newsletter called Pro-Med to
which I subscribe. Pro-Med came out of the Federation of American
Scientists, has a rather star-studded advisory board of public health
experts, and is quite respected. I suspect their staff and board would be
open to serving as a model, if the model fits.

Howard

Seems like you probably want an on-going group, akin to the developing
network of CERT teams, that focuses on operations anomolies, rather than
security incidents.

A third-party that is funded by the industry but separate from any
particular provider.

d/

Sean,

Do we actually need the cooperation of the organizations in question to
effect this? For large enough failures, the results are obvious and the data
is fairly clear. Perhaps a first stage of a Disruption Analysis Working
Group would simply be for a coordinated group to gather the facts, sort
through the impact, analyze the failure and report recommendations in a
public forum. A sponsoring organization that could provide a legal liability
shield would be desirable, as anyone not cooperating may make the
non-cooperation more active than passive.

Regards,

Eric Carroll
Tekton Internet Associates

Craig, Randy, Jhawk stop reading here-----------------------------------

<chuckle>

On the other hand, suppose instead of being very hard to reach I suddenly
started returning reporters' phone calls promptly and telling them about
this great idea I have to improve the reliability of the Internet.
Eventually one will write a story about it, and maybe even get some
decent coverage. How high up do I have to shoot in order to get your
CEO's attention? Does it have to be the front page of the New York Times?

Would that change the answer to the question above?

Yup.

Battle by press is a dicey game at best; you have to make _certain_
your press contact is knowledgeable enough to make the right points the
right way. We _do_, however, have several participants here who seem
to have a clue, who also have ink. I'm thinking of one in particular
who writes a column for InternetWorld (week?). Although it's not
"mainstream" business press, those folks _do_ read the trades, too...

We're becoming a utility, folks; it's time to act that way. It seems
to me that there's a niche market here for anyone with the capital and
inclination. _All_ of the net doesn't have to be high-availability, as
long as the HA section replicates the right things.

Decentralization doesn't require anyone's permission. And if we can't
survive a backhoe, how in _hell_ will we survive a pissed off Saddam
tossing nuclear SCUDs?

Cheers,
-- jra