gigabit router (was Re: Getting a "portable" /19 or /20)

Craig_Partridge · April 10, 2001, 11:35pm

OK, so your bus has 4.2 Gb/s of bandwidth. But, alas, you're in a PC
so you have to copy each packet from the line card, into main memory,
examine it, and push it back out to a line card. So each packet consumes
twice its size in bus bandwidth. So 2 1 Gb/s line cards will consume
4 Gb/s backplane. Assuming you can run the PCI at full rate (which in
my experience is a big big if), you can connect two Ethernets.

Incidentally, this isn't the full story either. You have to do a route
lookup on each of those packets. That's typically 5 to 10 memory
accesses... 5 memory access times 1 Mpps per gigabit times 2 gigabits
is 10 million lookups per second or 100 ns per lookup. Allowing for
time spent getting through the chip to the pins, you probably need 60 or
70ns DRAM, which is doable. Except, oops!, that completely consumes
your memory bandwidth... where are you going to find the cycles to
get the packets in and out?

Craig

PS: Side note, this illustrates where router vendors earn their bucks.
Find a way to move data over each bus only once (double your bandwidth!).
Design your memory subsystems to keep packets and routing data separate
(increase your memory bandwidth!). Find a processor that doesn't waste
cycles doing virtual memory (improve your memory access times!). Oh yes,
and then add hot board swap, a working BGP implementation (quick, where's
Tony Li working these days:-)), a CLI, and a power subsystem for a CO,
and you're in business.

Richard_A_Steenbegen · April 10, 2001, 11:49pm

>Don't be absurd, I can walk into fry's and pick up a motherboard with
>64bit/66mhz PCI, some Netgear GA620's, and all the other components for a
>1GHz computer for under $1000.

OK, so your bus has 4.2 Gb/s of bandwidth. But, alas, you're in a PC
so you have to copy each packet from the line card, into main memory,
examine it, and push it back out to a line card. So each packet consumes
twice its size in bus bandwidth. So 2 1 Gb/s line cards will consume
4 Gb/s backplane. Assuming you can run the PCI at full rate (which in
my experience is a big big if), you can connect two Ethernets.

Bad assumptions. Like I said, the Alteon Tigon 2 firmware is opensource,
and you don't HAVE to DMA the entire packet into main memory. You can
easily coalasce and preprocess packets on the card, transfer only packet
headers or smaller across the PCI bus, and then DMA between cards for the
rest of the payload. The limitation is the switch fabric, and poor
assumptions.

Incidentally, this isn't the full story either. You have to do a route
lookup on each of those packets. That's typically 5 to 10 memory
accesses... 5 memory access times 1 Mpps per gigabit times 2 gigabits
is 10 million lookups per second or 100 ns per lookup. Allowing for
time spent getting through the chip to the pins, you probably need 60 or
70ns DRAM, which is doable. Except, oops!, that completely consumes
your memory bandwidth... where are you going to find the cycles to
get the packets in and out?

Try ~ 10ns ram which you can buy 256MB of for ~ $60-80
(www.pricewatch.com). At any rate, A raw 3 or 4 level mtrie FIB fully
populated with the real 100k+ routes on the internet consumes less then
900kb, and all the interesting parts fit in the L2 cache of a Celeron A
where you can do about 22,000 lookups per MHz. Hardly excessive memory
bandwidth. The packet ram on the gige cards is also very fast, and could
easily accomidate a dCEF approach.

PS: Side note, this illustrates where router vendors earn their bucks.
Find a way to move data over each bus only once (double your
bandwidth!). Design your memory subsystems to keep packets and routing
data separate (increase your memory bandwidth!). Find a processor
that doesn't waste cycles doing virtual memory (improve your memory
access times!). Oh yes, and then add hot board swap, a working BGP
implementation (quick, where's Tony Li working these days:-)), a CLI,
and a power subsystem for a CO, and you're in business.

The last remaining dominance is the switch fabric, and with people like
broadcom churning out 12Mpps switch fabrics on a single chip for a few
hundred dollars, you are a fool if you believe that money is going
anywhere other then to the 50,000 people supporting the 50 ancient
protocols, and straight to the bank. And don't even get me started on BGP
(Procket btw).

Alex_Yuriev2 · April 11, 2001, 12:03am

Try ~ 10ns ram which you can buy 256MB of for ~ $60-80
(www.pricewatch.com). At any rate, A raw 3 or 4 level mtrie FIB fully
populated with the real 100k+ routes on the internet consumes less then
900kb, and all the interesting parts fit in the L2 cache of a Celeron A
where you can do about 22,000 lookups per MHz. Hardly excessive memory
bandwidth. The packet ram on the gige cards is also very fast, and could
easily accomidate a dCEF approach.

CEF should be called Customer Enrangement Feature. It is a very very very
bad idea to have linecards be anything else than forwarders. They should not
make any intelligent routing decisions. There should not be a tons of copies
of routing table on line cards. That is what creates problems.

Alex

David_Schwartz · April 11, 2001, 1:48am

CEF should be called Customer Enrangement Feature. It is a very very very
bad idea to have linecards be anything else than forwarders. They
should not
make any intelligent routing decisions. There should not be a
tons of copies
of routing table on line cards. That is what creates problems.

Alex

CEF allows linecards to be forwarders. They don't make any routing
decisions, they just forward packets according to a routing table. (Routing
= deciding where packets should go, ie building a routing table. Forwarding
= sending packets to their destination, ie using a routing table.)

The reality is that having only one copy of the routing table creates an
inevitable bottleneck. For the same reasons this won't work on a regional
network, it won't work on a single router if the router is sufficiently
complex. The same techniques that work to scale the Internet as a whole work
inside a box.

Why do you think central fowarding is superior to distributed forwarding?

DS

Alex_Yuriev2 · April 11, 2001, 12:56am

> CEF should be called Customer Enrangement Feature. It is a very very very
> bad idea to have linecards be anything else than forwarders. They
> should not
> make any intelligent routing decisions. There should not be a
> tons of copies
> of routing table on line cards. That is what creates problems.

CEF allows linecards to be forwarders. They don't make any routing
decisions, they just forward packets according to a routing table. (Routing
= deciding where packets should go, ie building a routing table. Forwarding
= sending packets to their destination, ie using a routing table.)

Excellent idea. Why, pray tell, then there is such things as "show cef
drop" and "show cef not-cef-switched"?

The reality is that having only one copy of the routing table
creates an inevitable bottleneck.

Wrong answer. Routing table != forwarding table

For the same reasons this won't work on a regional network, it won't work
on a single router if the router is sufficiently complex.

Wrong answer again. Routing view != forwarding table

The same techniques that work to scale the Internet as a whole work inside
a box.

Wrong answer again.

Why do you think central fowarding is superior to distributed forwarding?

Because you will have consistency problem. You are nearly 100% guaranteed to
have them.

DS

Alex

Richard_A_Steenbegen · April 11, 2001, 2:14am

They don't make intelligent routing decisions, the route processor does
then pushes the FORWARDING table down to the individual cards. There is
nothing wrong with distributed copies of the forwarding table in easy
access of the hardware doing the forwarding, but you should make those
copies over a bus which actually copies things correctly
*coughmbussuckscough*.

Bora_Akyol · April 11, 2001, 3:59am

If your packets are memory mapped and you are using DMA, you don't
necessarily need to copy each packet into the main memory in an out, you can
look at the headers then ship the packet out. For lookups, one can use a
ternary CAM where you get 1 lookup per each memory access.

Nevertheless, there are much better alternatives to building a fast switch
or a router then using a general purpose CPU.

At the low end of the spectrum, quite a few companies supply network
processors with advanced lookup capabilities and at the high end, one can
always use ASICs, or FPGAs combined with CAMs.

Bora

David_Schwartz · April 11, 2001, 7:26am

> Why do you think central fowarding is superior to
> distributed forwarding?

Because you will have consistency problem. You are nearly 100%
guaranteed to
have them.

Alex

Ahh, so that's what you're thinking.

If you have forwarding table F(X) at time X and forwarding table F(X+1) at
time X+1, a packet that arrives between times X and X+2 can reasonably be
forwarded by any of the tables. There is no special sequencing present or
required between the packets that involve routing protocols and the data
packets.

Suppose a router received a packet that causes it to modify its routing
table in some way. If another packet is received in close time proximity to
the first packet, it can be reasonably routed by either policy. Even a
router with a central table could still route it either way, depending upon
when the routing process get scheduled in relation to when the interface
interrupt is services. (Or for other reasons, depending upon the hardware
you are dealing with.)

The only way to sure this type of consistency is to centrally process every
single packet in strict sequence, fully applying all routing changes the
packet may require. There is no benefit to this added effort, after all, the
router would still have to work even if the packet with the routing data was
dropped.

We misroute packets between routers because routing table updates don't
happen fast enough. It's not a problem -- IP is designed to tolerate packet
losses and has never guaranteed sequencing.

The added occasional misroutes due to inconsistency will be proportional to
the ratio of the average network transport time for a routing protocol
packet to the average delay in propogating forwarding table changes to a
linecard. You do the math.

DS

Rafi_Sadowsky · April 11, 2001, 8:15am

[deleted]

CEF should be called Customer Enrangement Feature. It is a very very very
bad idea to have linecards be anything else than forwarders. They should not
make any intelligent routing decisions. There should not be a tons of copies
of routing table on line cards. That is what creates problems.

Without getting into whether dCEF(Distributed CEF) is good or bad -
CEF gives a performance boost even on a single CPU box -
Try any 7200 series Cisco as existence proof

Alex

Rafi

Matt_Zimmerman · April 11, 2001, 8:47pm

> > Why do you think central fowarding is superior to distributed
> > forwarding?
>
> Because you will have consistency problem. You are nearly 100% guaranteed
> to have them.
>
> Alex

  Ahh, so that's what you're thinking.

  If you have forwarding table F(X) at time X and forwarding table F(X+1)
  at time X+1, a packet that arrives between times X and X+2 can
  reasonably be forwarded by any of the tables. There is no special
  sequencing present or required between the packets that involve routing
  protocols and the data packets.

I think Alex was referring to internal consistency within the router (between
linecards), not external consistency. For example, if linecard X believes that
a packet should be forwarded to linecard Y, but linecard Y's forwarding table
is older than X's, Y could misforward the packet, causing a forwarding loop or
a dropped packet. Thus, it can be the case that neither the old path nor the
new path is taken.

Yes, there are ways to approach this problem, but it is a problem that
central-forwarding systems will not have.

  We misroute packets between routers because routing table updates don't
  happen fast enough. It's not a problem -- IP is designed to tolerate
  packet losses and has never guaranteed sequencing.

It is true that IP does not make guarantees about delivery, but packet loss has
a detrimental effect on performance nonetheless.

  The added occasional misroutes due to inconsistency will be
  proportional to the ratio of the average network transport time for a
  routing protocol packet to the average delay in propogating forwarding
  table changes to a linecard. You do the math.

I think a more useful model is this:

S(X) = (% of time that a router X spends in a consistent state) *
(packets/sec through router X)

For the percentage of packets which will be successfully routed. The total
end-to-end loss is 1 - S(X)^N for N identical routers. N >= 20 is not uncommon
these days, and packets/sec gets higher all the time.

Michael_C_Wu · April 16, 2001, 7:14pm

On Tue, Apr 10, 2001 at 07:35:35PM -0400, Craig Partridge scribbled:

In message <Pine.BSF.4.21.0104101753540.98098-100000@overlord.e-gerbil.net>, "R
>Don't be absurd, I can walk into fry's and pick up a motherboard with
>64bit/66mhz PCI, some Netgear GA620's, and all the other components for a
>1GHz computer for under $1000.

OK, so your bus has 4.2 Gb/s of bandwidth. But, alas, you're in a PC
so you have to copy each packet from the line card, into main memory,
examine it, and push it back out to a line card. So each packet consumes
twice its size in bus bandwidth. So 2 1 Gb/s line cards will consume
4 Gb/s backplane. Assuming you can run the PCI at full rate (which in
my experience is a big big if), you can connect two Ethernets.

I don't think you have to use X86. Take a look at other platforms.
For example, Alpha, UltraSparc, or PowerPC.