How relable does the Internet need to be? (Was: Re: Converged Network Threat)

michael.dillon2 · February 26, 2004, 11:48am

I think the Internet is doing pretty well save some IOS code problems
from time to time, and the typical root server hicups.

I'm interested to know what you mean by "typical root server hicups".
I'm trying to think of an incident which left the Internet generally
unable to receive answers to queries on the root zone, but I can't
think of one.

There have been several incidents in which some root servers
have hiccuped, sometimes being down for several days. But since
the service they provide has N+12 resiliency, the service itself
has never been unavailable.

Similarly, the Internet has always had N+1 or better vendor resiliency
so IOS can have problems while the non-IOS vendor (or vendors) keep on
running. In fact, I would argue that N+1 vendor resiliency is a good
thing for you to implement in your network and N+2 vendor resiliency is
a good thing for the Internet as a whole. Let's hope that vendor P manages

to get some critical mass in the market along with J and C.

--Michael Dillon

Vijay_Gill · February 26, 2004, 2:10pm

Unfortunately, while this sounds excellent in theory, what really
happens is that you have a large chunk of equipment in the network
belonging to vendor X, and then you introduce vendor Y. Most people
I know don't suddenly throw out vendor X (assuming that this was
a somewhat competent choice in the first place, jumped up l2 boxes
with slow-first-path-to-setup-tcams-for-subsequent-flows don't
count as somewhat competent). People don't do that because it costs
a lot of capital and opex. So now we have a partial X and partial
Y network, X goes down, and chances are your network got hammered
like an icecube in a blender set to Frappe.

You could theroetically have a multiplane network with each plane
comprising of a different vendor (and we do that on some of our DWDM
rings), but that is a luxury ill-afforded to most people.

/vijay

Stephen_Sprunk3 · February 28, 2004, 10:04am

Thus spake "vijay gill" <vgill@vijaygill.com>

Unfortunately, while this sounds excellent in theory, what really
happens is that you have a large chunk of equipment in the network
belonging to vendor X, and then you introduce vendor Y. Most people
I know don't suddenly throw out vendor X ... . People don't do that
because it costs a lot of capital and opex. So now we have a partial X
and partial Y network, X goes down, and chances are your network
got hammered like an icecube in a blender set to Frappe.

I think an important factor in this is that multiple vendors are rarely
deployed within redundant pairs, which at least has a hope of surviving one
vendor's cascading software faults.

More often, each vendor's products are used universally in particular
tier(s) in the network, such that a failure of one vendor may leave you with
no access but a working core, or vice versa.

S

Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin