RE: What is the limit? (was RE: multi-homing fixes)

From: smd@clock.org [mailto:smd@clock.org]
Sent: Wednesday, August 29, 2001 8:06 AM

  Draw two curves, the first y=x/2, the second y=x^2
Move the value of x for y=1 for the first curve left by 2, 5 or 10
and it will still be surpassed by the second curve.
You will even see this for a second curve of y=x*2 or y=x.

Prove it.

  The global routing table size HAS grown exponentially
in the past. Rationalize it any way you want, blame whatever
you like, but there is no known way to construct a router that
can handle that kind of growth in anything but a short term,

Sorry, Leo is correct. Technologies he outlined are only the tip of the
ice-berg of what *isn't* being exploited by the router vendors. Yes, my
2-year old Dell lap-top has more horse-power than your average Cat 3524XL.
But, sadly, it doesn't have the I/O capacity. However, one of my
tight-fisted dot-com clients (why they're still in business) had me build
three 1U 8-port, router/switchs, built from COTS commodity parts. It worked
at line-speed. Code was courtesy of the Linux Router Project (open source).
At retail prices, the entire parts cost was less than $6KUS each. Specs
were, dual PIII-866, 2GB RAM, 1 CDROM boot drive (no HDD), three 4-port PCI
100baseTX PCI cards, 1U rackmount case. 1 week for the build and two for the
software. It still cost less than equivalent Cisco purchase. Yes, I tried to
sell them Cisco Catalysts instead. To some extent, Cisco deserves what's
happening to their stock prices right now. [ side note: one of the biggest
benefits of the, non-proprietary, open source Linux movement is that it lets
me do *real* computer engineering again].

Yes, the routing vendor's largest advantage is proprietary back-plane
design, optimized for I/O. But, with ASIC technology today, and most
importantly, modern macro architectures, one works around that easily. I had
plenty of capacity to add DNS zone server software and web-based
administration with server-side Java. I had seriously considered
productizing it, but in the current economic climate, no one is funding such
projects, yet. BTW, this was built almost a year ago. They're still online.
Note that PCI bus technology is more than 10 years old. We can do better if
we want to and if there's money in it.

and the trend for the components in the router growth curve
is simply not going to increase to a long term superlinear rate.

That trend is set very much by the router vendors. They have deliberately
held growth down in order to keep prices inflated. They can do this because
there aren't a whole lot of real computer engineers around anymore.
Everyone's a specialist these days and very few do engineering at the macro
level, they're called Computer Architects. Many of us *don't* do the IETF
thing because we'd rather get paid for our work. In the real world, prestige
don't pay the rent.

The router vendors have done a real good job convincing everyone that they
are at the state-of-the-computer-arts. I've got a clue for you, they're far
behind it.

  A 10x system performance boost today just moves the x point for
y=1 of fundamental curve claimed by Moore's Law to the left
a few notches. Or are you claiming that routing equipment
will have a fundamentally different, and larger, growth curve
than other computing systems? (I think there is a basis for
claiming that there are some reasons which would support a
_shallower_ growth curve for routing equipment, actually).

As said before, we could see a one-time step up, by more than an order of
magnitude. That alone should get us to the forseeable end of the IPv4 cycle.
Yes, I agree that we need to do something for IPv6...starting NOW!

  In short: are you claiming that the caeteris paribus assumption
in comparing Moore's Law to global routing table size is
clearly false?

That depends on how you apply *what* technology to the curve. It's only
proven by corrolary, no hard evidence is provided. I'm not convinced that
the given Moore's law curve is accurate. In the secular world of the router
vendor, they may think that they're on it. Simple observation shows that
they aren't. Why do you think that warranty terms prohibit pealing open the
$10KUS box?

Roeland,

  Draw two curves, the first y=x/2, the second y=x^2
Move the value of x for y=1 for the first curve left by 2, 5 or 10
and it will still be surpassed by the second curve.
You will even see this for a second curve of y=x*2 or y=x.

Prove it.

Prove that y1=A(x^2)+Bx+C always exceeds y0=Dx+E
for positive A and D, for all x>x0 for
some value x0?

Um, y1-y0 = A(x^2) + (B-D)x + (C-E) [1]

This is a positive parabola with standard
solutions. To the right of it's higher root,
it's always positive, so y1>y0.

Now, I take it you don't want proof of
the roots to quadratic equations?

Your average PC doesn't have to be NEBS-compliant, doesn't have to work
more than 24 hours w/o crashing, and doesn't have quite strict constraints
on power & heat dissipation. It doesn't have to have redundant power, and
its components are readily available and cheap (those are produced in
_large_ batches).

Using the "latest and greatest" in routers is not as easy as it seems.
First of all, when you get a new CPU you typically get a pre-packaged set
of peripherial chips (memory controllers, I/O bridges, etc) which are OK
for building a PC but patently useless for building a router with its
special needs for I/O performance.

So then you have to build custom chips around the CPUs; and you just
cannot get any useful advance information from CPU manufacturers because
they do not want to undercut their business in peripheral chips (as will
happen if their CPU interface specs leaked). You have to wait until the
actual chip is released (or close to release). PC manufacturers do not
worry abouth those things - they get ready-to-use reference motherboard
designs, together with chip bundles; initial prices are high, and then
companies in Taiwan start to reverse-engineer the stuff and drive prices
lower.

And don't get me started on heat and airflow issues :slight_smile: Reason #1 why
Pluris abandoned the original idea of using commodity CPUs was heat, not
the switching speed.

--vadim

I'm going to poke Vadim a bit. :slight_smile:

If you're building a multi-bay router (a la a number of new designs)
why not use a bay for the general purpose functions? Specifically
something like a sun E10000, or HP v-class (to illustrate top of
the line but off the shelf) connected into the fabric? Why even try
to build the processing on a board (with all the power and heat
constraints) for a system that large (say 16 bays)?

Of course, this doesn't work too well if you have to take a full bay
for a "routing engine" for a quarter rack forwarding chassis, so the
approch doesn't work on the smaller side, but that said there are
lots of N-Way servers available.

Bottom line, why doesn't a router vendor partner with a host builder,
and let them do what they do best (build a host), while the router
vendor does what they do best (build forwarding hardware)? I guess
you could argue Juniper did this, although I find it hard to consider
it a partnership when one side is free software.

For the record, big, multi-rack but "single management" routers make
me nervous.

> Your average PC doesn't have to be NEBS-compliant, doesn't have to work
> more than 24 hours w/o crashing, and doesn't have quite strict constraints
> on power & heat dissipation. It doesn't have to have redundant power, and
> its components are readily available and cheap (those are produced in
> _large_ batches).

I'm going to poke Vadim a bit. :slight_smile:

You're welcome :slight_smile:

If you're building a multi-bay router (a la a number of new designs)
why not use a bay for the general purpose functions? Specifically
something like a sun E10000, or HP v-class (to illustrate top of
the line but off the shelf) connected into the fabric? Why even try
to build the processing on a board (with all the power and heat
constraints) for a system that large (say 16 bays)?

:slight_smile: That was in the original Pluris presentations. Then the race against
competition forced to go to specialized design. Density & power
parameters were simply uncompetitive for off-the-shelf parts-only designs,
as compared to large Cisco and Juniper boxes.

However, the way Pluris optical fabric is designed, it is easy to add
lower-capacity bays to existing routers (i.e. not 12Gbps/card slot,
but, say, 12Gbps/bay); and I think we'll see hybrid router/server farms
in the future.

For the record, big, multi-rack but "single management" routers make
me nervous.

Cannot say about other designs, but Pluris has distributed redundant
control (i.e. each bay has its own control cards). This is no different
from the redundancy point of view from clustered routers. A lot more
manageable, though, since all those controller cards are synchronized
configuration-wise.

--vadim

:slight_smile: That was in the original Pluris presentations. Then the race against
competition forced to go to specialized design. Density & power
parameters were simply uncompetitive for off-the-shelf parts-only designs,
as compared to large Cisco and Juniper boxes.

This is too good, I get to use an amazing analogy.

Routers power and heat are growing like the routing table.

POP power and AC are growing like the routers.

Ha! Clearly the largest of the new routers have huge space/power/heat
issues. Consider a box with 256-1024 OC-48 interfaces in a single
rack. It's going to be tall, and deep, and suck power like nobodys
business and turn it all into heat. Sure, big hosts are power and
space monsters (anyone have a water cooled IBM mainframe in their
basement?), but I fear the routers are going to zoom right past them.
In fact, I predict power may be a huge limiting factor to the growth
of the Internet in the not too distant future.

Cannot say about other designs, but Pluris has distributed redundant
control (i.e. each bay has its own control cards). This is no different
from the redundancy point of view from clustered routers. A lot more
manageable, though, since all those controller cards are synchronized
configuration-wise.

I'm not saying I wouldn't buy one, or don't think they are safe,
but they have to live up to a different standard. Think about
accident investigation for a car crash vrs an airplane crash. If
you want a provider to trust a large, tightly coupled system you
need to provide airplane accident like support when it breaks,
that's all.

Sorry, Leo is correct. Technologies he outlined are only the tip of the
ice-berg of what *isn't* being exploited by the router vendors.

Your average PC doesn't have to be NEBS-compliant, doesn't have to work
more than 24 hours w/o crashing, and doesn't have quite strict constraints
on power & heat dissipation. It doesn't have to have redundant power, and
its components are readily available and cheap (those are produced in
_large_ batches).

i think mo said something like "can we not discuss building global
infrastructure using home appliances?"

randy

Consider a single point of failure....

Note that the magic of an E10K isn't the processors (which are pretty
similar to the E6500), it's the partitioning and backplane magic.

An E10K backplane is pretty deep voodoo. I'm told that it's not
a Sun design, nor is it a Sun manufactured. They're pretty pricey
too - I'm guessing that you DID have a price point for this router with
less than 7 digits in it?

/Valdis

cool features like having 4096 or so registers.

But I don't think people are going to be interested in liquid
cooled routers, even if they do have a nice couch :slight_smile:

David.

Teraplex 20 in full configuration (19.2 Tbps switching capacity, 128
racks) consumes 0.8 megawatt, not counting A/C.

Reminds me good ol' times when main Relcom POP in Moscow had an on-site
nuclear reactor as a source of back-up electricity (well, it was located
in the Kurchatov Institute of Atomic Energy :slight_smile:

--vadim

It may be time to start thinking about other means of powering &
cooling this equipment. The current method of big DC cables and
lots of fans may not work for all that much longer.

I have proposed to various router vendors the possibility of giving
them a chilled water feed instead of lots of cool air. At the
moment they seem to not need it, but I would not be surprized to
find something like this needed at some point.
  --asp@partan.com (Andrew Partan)

Err. Water and electricvity make a dangerous mix.

--vadim

And this was not a problem in IBM Mainframe computers because?

I'm not registering an opinion one way or the other at this point on whether routers should consider other forms of cooling, but using water or other liquids to cool electronics is not a new concept. Properly engineered, there is no particular danger.

Ya. My air handling systems seem to deal with it OK.

These big routers all seem to want more & more power - and thus
generate more & more heat. Its time to think of other ways of
cooling them other than trying to cool more & more air. I suspect
that sooner or later air cooling isn't going to handle it. Thus
the question of what you replace cold air with.

I suppose you could put routers in wind tunnels, but then getting
a technician in there to do OIR is going to be a bit tricky.

  --asp@partan.com (Andrew Partan)

I recall something about liquid (something, could be nitrogen, or perhaps
even mercury) that was chilled; PC Boards were then submersed in the
liquid to keep cool.

Maybe on Crays?

> Err. Water and electricvity make a dangerous mix.

Ya. My air handling systems seem to deal with it OK.

They are kept in different places. Now think how are you going to combine
hot-swappability and water cooling, and not let moisture to get out :slight_smile:

These big routers all seem to want more & more power - and thus
generate more & more heat. Its time to think of other ways of
cooling them other than trying to cool more & more air. I suspect
that sooner or later air cooling isn't going to handle it. Thus
the question of what you replace cold air with.

I suppose you could put routers in wind tunnels, but then getting
a technician in there to do OIR is going to be a bit tricky.

As soon as you do liquid cooling the price goes up an order of magnitude.

This is as simple as that.

My take is that the real answer is to move from heavy, sticky electrons to
photons; the trick is to get them to interact in useful ways :slight_smile:

--vadim

I recall something about liquid (something, could be nitrogen, or perhaps
even mercury) that was chilled; PC Boards were then submersed in the
liquid to keep cool.

Maybe on Crays?

dunno about Crays..... but definitely on Control Datas ETA 10s designed to replace the CYBER 205

in 1988-1990 at the von Nueman supercomputer center in princeton we had 2 eta 10s.... each with large CPU boards that were immersed in tanks of liquid nitrogen. If we had a AC problem the operator were trained to do the quickest possible shut down on the machines..... even so with in five minutes the temp inside the machine room went up by 10 degrees

The crays were liquid nitrogen cooled, using mercury would be really a bad idea
since its a pretty good conductor.
ak

Alex Rubenstein wrote:

I recall something about liquid (something, could be nitrogen, or perhaps
even mercury) that was chilled; PC Boards were then submersed in the
liquid to keep cool.

Maybe on Crays?

Maybe in someone's garage? :slight_smile:

http://www.octools.com/articles/submersion/submersion12.html

C