Scalability issues in the Internet routing system

I guess it's time to have a look at the actual scalability issues we
face in the Internet routing system. Maybe the area of action becomes
a bit more clear with such an assessment.

In the current Internet routing system we face two distinctive scalability
issues:

1. The number of prefixes*paths in the routing table and interdomain
    routing system (BGP)

This problem scales with the number of prefixes and available paths
to a particlar router/network in addition to constant churn in the
reachablility state. The required capacity for a routers control
plane is:

  capacity = prefix * path * churnfactor / second

I think it is safe, even with projected AS and IP uptake, to assume
Moore's law can cope with this.

2. The number of longest match prefixes in the forwarding table

This problem scales with the number of prefixes and the number of
packets per second the router has to process under full or expected
load. The required capacity for a routers forwarding plane is:

  capacity = prefixes * packets / second

This one is much harder to cope with as the number of prefixes and
the link speeds are rising. Thus the problem is multiplicative to
quadratic.

Here I think Moore's law doesn't cope with the increase in projected
growth in longest prefix match prefixes and link speed. Doing longest
prefix matches in hardware is relatively complex. Even more so for
the additional bits in IPv6. Doing perfect matches in hardware is
much easier though...

1. The number of prefixes*paths in the routing table and interdomain
   routing system (BGP)

This problem scales with the number of prefixes and available paths
to a particlar router/network in addition to constant churn in the
reachablility state. The required capacity for a routers control
plane is:

capacity = prefix * path * churnfactor / second

I think it is safe, even with projected AS and IP uptake, to assume
Moore's law can cope with this.

Especially since this does not have to be done in real time. BGP updates can take many seconds to process without end users thinking anything is amiss.

2. The number of longest match prefixes in the forwarding table

This problem scales with the number of prefixes and the number of
packets per second the router has to process under full or expected
load. The required capacity for a routers forwarding plane is:

capacity = prefixes * packets / second

This one is much harder to cope with as the number of prefixes and
the link speeds are rising. Thus the problem is multiplicative to
quadratic.

Here I think Moore's law doesn't cope with the increase in projected
growth in longest prefix match prefixes and link speed. Doing longest
prefix matches in hardware is relatively complex. Even more so for
the additional bits in IPv6. Doing perfect matches in hardware is
much easier though...

You are mistaken in one of your assumptions. The FIB is generated asynchronously to packets being forwarded, and usually not even by the same processor (at least for routers "in the core"). Therefore things like pps / link speed are orthogonal to longest match. (Unless you are claiming the number of new prefixes is related to link speed. But I don't think anyone considers a link which has nothing but BGP updates on it a realistic or useful metric.

Patrick W. Gilmore wrote:

I guess it's time to have a look at the actual scalability issues we
face in the Internet routing system. Maybe the area of action becomes
a bit more clear with such an assessment.

In the current Internet routing system we face two distinctive scalability
issues:

1. The number of prefixes*paths in the routing table and interdomain
   routing system (BGP)

This problem scales with the number of prefixes and available paths
to a particlar router/network in addition to constant churn in the
reachablility state. The required capacity for a routers control
plane is:

capacity = prefix * path * churnfactor / second

I think it is safe, even with projected AS and IP uptake, to assume
Moore's law can cope with this.

Moore will keep up reasonably with both the CPU needed to keep BGP perking, and with memory requirements for the RIB, as well as other non-data-path functions of routers.

2. The number of longest match prefixes in the forwarding table

This problem scales with the number of prefixes and the number of
packets per second the router has to process under full or expected
load. The required capacity for a routers forwarding plane is:

capacity = prefixes * packets / second

This one is much harder to cope with as the number of prefixes and
the link speeds are rising. Thus the problem is multiplicative to
quadratic.

Here I think Moore's law doesn't cope with the increase in projected
growth in longest prefix match prefixes and link speed. Doing longest
prefix matches in hardware is relatively complex. Even more so for
the additional bits in IPv6. Doing perfect matches in hardware is
much easier though...

Several items regarding FIB lookup:

1) The design of the FIB need not be the same as the RIB. There is plenty of room for creativity in router design in this space. Specifically, the FIB could be dramatically reduced in size via aggregation. The number of egress points (real or virtual) and/or policies within a router is likely FAR smaller than the total number of routes. It's unclear if any significant effort has been put into this.

2) Nothing says the design of the FIB lookup hardware has to be longest match. Other designs are quite possible. Again, some creativity in design could go a long way. The end result must match that which would be provided by longest-match lookup, but that doesn't mean the ASIC/FPGA or general purpose CPUs on the line card actually have to implement the mechanism in that fashion.

3) Don't discount novel uses of commodity components. There are fast CPU chips available today that may be appropriate to embed on line cards with a bit of firmware, and may be a lot more cost effective and sufficiently fast compared to custom ASICs of a few years ago. The definition of what's hardware and what's software on line cards need not be entirely defined by whether the design is executed entirely by a hardware engineer or a software engineer.

Finally, don't discount the value and performance of software-based routers. MPLS was first "sold" as a way to deal with core routers not handling Gigabit links. The idea was to get the edge routers to take over. Present CPU technology, especially with good embedded systems software design, is quite capable of performing the functions needed for edge routers in many circumstances. It may well make sense to consider a mix of router types based on port count and speed at edges and/or chassis routers with line cards that are using general purpose CPUs for forwarding engines instead of ASICs for lower-volume sites. If we actually wind up with the core of most backbones running MPLS after all, well, we've got the technology so use it. Inter-AS routers for backbones, will likely need to continue to be large, power-hungry boxes so that policy can be separately applied on the borders.

I should point out that none of this really is about scalability of the routing system of the Internet, it's all about hardware and software design to allow the present system to scale. Looking at completely different and more scalable routing would require finding a better way to do things than the present BGP approach.

Depends on the way the FIB is engineered.

Andre Oppermann wrote:

I guess it's time to have a look at the actual scalability issues we
face in the Internet routing system. Maybe the area of action becomes
a bit more clear with such an assessment.

In the current Internet routing system we face two distinctive scalability
issues:

1. The number of prefixes*paths in the routing table and interdomain
   routing system (BGP)

This problem scales with the number of prefixes and available paths
to a particlar router/network in addition to constant churn in the
reachablility state. The required capacity for a routers control
plane is:

capacity = prefix * path * churnfactor / second

I think it is safe, even with projected AS and IP uptake, to assume
Moore's law can cope with this.

Moore's law for CPUs is kaput. Really, Moore's Law is more of an observation, than a law. We need to stop fixating on Moore's law for the love of god. It doesn't exist in a vacuum, Components don't get on the curve for free. Each generation requires enormously more capital to engineer the improved Si process, innovation, process, which only get paid for by increasing demand. If the demand slows down then the investment won't be recovered and the cycle will stop, possibly before the physics limits, depending on the amount of demand, amount of investment required for the next turn etc.

Also, no network I know is on the upgrade path at a velocity that they are swapping out components in a 18 month window. Ideally, for an economically viable network, you want to be on an upgrade cycle that lags Moore's observation. Getting routers off your books is not an 18 month cycle, it is closer to 48 months or even in some cases 60 months.

Then we have the issue of an memory bandwidth to keep the ever changing prefixes updated and synced.

/vijay

vijay gill wrote:

Andre Oppermann wrote:

I guess it's time to have a look at the actual scalability issues we
face in the Internet routing system. Maybe the area of action becomes
a bit more clear with such an assessment.

In the current Internet routing system we face two distinctive scalability
issues:

1. The number of prefixes*paths in the routing table and interdomain
   routing system (BGP)

This problem scales with the number of prefixes and available paths
to a particlar router/network in addition to constant churn in the
reachablility state. The required capacity for a routers control
plane is:

capacity = prefix * path * churnfactor / second

I think it is safe, even with projected AS and IP uptake, to assume
Moore's law can cope with this.

Moore's law for CPUs is kaput. Really, Moore's Law is more of an observation, than a law. We need to stop fixating on Moore's law for the love of god. It doesn't exist in a vacuum, Components don't get on the curve for free. Each generation requires enormously more capital to engineer the improved Si process, innovation, process, which only get paid for by increasing demand. If the demand slows down then the investment won't be recovered and the cycle will stop, possibly before the physics limits, depending on the amount of demand, amount of investment required for the next turn etc.

Predicting the future was a tricky business ten years ago and still is
today. What makes you think the wheel stops turning today? Customer
access speed will no increase? No more improvements in DSL, Cable and
Wireless technologies? Come on, you're kidding. Right?

Also, no network I know is on the upgrade path at a velocity that they are swapping out components in a 18 month window. Ideally, for an economically viable network, you want to be on an upgrade cycle that lags Moore's observation. Getting routers off your books is not an 18 month cycle, it is closer to 48 months or even in some cases 60 months.

When you are buying a router today do you specify it to cope with 200k
routes or more? Planning ahead is essential.

Then we have the issue of an memory bandwidth to keep the ever changing prefixes updated and synced.

Compared to link speed this is nothing. And nowhere near to memory bandwidth.

Daniel Senie wrote:

[many interesting hw design approaches that you can buy already deleted]

I should point out that none of this really is about scalability of the routing system of the Internet, it's all about hardware and software design to allow the present system to scale. Looking at completely different and more scalable routing would require finding a better way to do things than the present BGP approach.

I disagree. By your description there is no problem scaling the current
model to much bigger numbers of prefixes and paths. Then why not simply
do it??? Apparently there ain't a problem then?

For routing you have to ways: The BGP (DFZ) way or the aggregation (by
whatever arbitrary level/layer divider de jour) way. If you have third,
different (not just a modification or merge of partial aspects of the
former two) you may be eglible for a Nobel prize in mathematics in a
few decades. They run behind a bit, you know...

Andre Oppermann wrote:

vijay gill wrote:

Moore's law for CPUs is kaput. Really, Moore's Law is more of an observation, than a law. We need to stop fixating on Moore's law for the love of god. It doesn't exist in a vacuum, Components don't get on the curve for free. Each generation requires enormously more capital to engineer the improved Si process, innovation, process, which only get paid for by increasing demand. If the demand slows down then the investment won't be recovered and the cycle will stop, possibly before the physics limits, depending on the amount of demand, amount of investment required for the next turn etc.

Predicting the future was a tricky business ten years ago and still is
today. What makes you think the wheel stops turning today? Customer
access speed will no increase? No more improvements in DSL, Cable and
Wireless technologies? Come on, you're kidding. Right?

Missing the point. We can deal with increased speeds by going wider, the network topology data/control plane isn't going wider, THAT is where the moore's observation was targeted at.

Also, no network I know is on the upgrade path at a velocity that they are swapping out components in a 18 month window. Ideally, for an economically viable network, you want to be on an upgrade cycle that lags Moore's observation. Getting routers off your books is not an 18 month cycle, it is closer to 48 months or even in some cases 60 months.

When you are buying a router today do you specify it to cope with 200k
routes or more? Planning ahead is essential.

And we're paying for it. But again, assuming that the prefix/memory bandwidth churn can be accommodated by the next generation of cpus. I am not going to throw out my router in 18 months. Its still on the books.

Then we have the issue of an memory bandwidth to keep the ever changing prefixes updated and synced.

Compared to link speed this is nothing. And nowhere near to memory bandwidth.

Each update to and fro from memory takes cycles, and as the routing tables become bigger, the frequency of access to the memory for keeping the system in sync impose a larger burden. This is orthogonal to link speed.

/vijay

Andre,

capacity = prefix * path * churnfactor / second

capacity = prefixes * packets / second

I think it is safe, even with projected AS and IP uptake, to assume
Moore's law can cope with this.

This one is much harder to cope with as the number of prefixes and
the link speeds are rising. Thus the problem is multiplicative to
quadratic.

You'll note that the number of prefixes is key to both of your equations. If the number of prefixes exceeds Moore's law, then it will be very difficult to get either of your equations to remain under Moore's law on the left hand side.

That's the whole point of the discussion.

Tony

Moore's law for CPUs is kaput. Really, Moore's Law is more of an
observation, than a law. We need to stop fixating on Moore's law for the
love of god. It doesn't exist in a vacuum, Components don't get on the
curve for free. Each generation requires enormously more capital to
engineer the improved Si process, innovation, process, which only get
paid for by increasing demand. If the demand slows down then the
investment won't be recovered and the cycle will stop, possibly before
the physics limits, depending on the amount of demand, amount of
investment required for the next turn etc.

Moore's "observation" would also seem to apply only to the highest end of
components that are actually "available", not to what a particular vendor
ships in a particular product. Of course we have the technology available
to handle 1 million BGP routes or more if we really needed to. Processing
BGP is fairly easy and linear, modern CPUs are cheap and almost absurdly
powerful in comparison to the task at hand (they're made to run Windows
code remember :P). But if there is no reason to make a product, it
doesn't get made.

Comparing consumer-grade PCs which are produced in the millions or
billions with one small part of a high-end router which is produced in the
thousands, and which is essentially an embedded product, is a non-starter.
The product that is sold does what it needs to do and no more. There is no
reason to design a scalable route processor which can be easily upgraded.
There is no need to ship the latest state of the art Dual Xeon 3.6GHz with
support for 64GB of DRAM that will last you for your BGP needs for the
next 10 years, or even to throw in 128MB of SSRAM for a few thousand bucks
more.

Everyone needs to sell their latest and greatest new product on a regular
basis to stay in business, it is no different for router vendors. They
sell you a $20,000 route processor that you could pick up from their
vendor directly at Qty 1 for $1000, and the markup goes into the what you
are really paying for (R&D, software dev, testing, support, sales, and the
bottom line). A few years later when you need something a little faster,
you can buy a new router. Besides, you probably needed to come back for
the next-gen platform and interfaces anyways. Most customers are barely
qualified to add RAM to their million dollar router, and I doubt the
vendors see any need to change this.

Also, no network I know is on the upgrade path at a velocity that they
are swapping out components in a 18 month window. Ideally, for an
economically viable network, you want to be on an upgrade cycle that
lags Moore's observation. Getting routers off your books is not an 18
month cycle, it is closer to 48 months or even in some cases 60 months.

Want to venture a guess as to how many networks are still operating
routers with parts at the end of that 60 month cycle (purchased in 2000),
which were in turn based off of 1997 technology back when they were
manufactured in 1998-1999?

Hello All.
  I have seen people writing in now and again with things like this and I never
thought I would be one of them. But here goes.

After doing a netgear firewall upgrade which suplpied some extra information,
I started noticing these odd pings to all over the place from a computer I have.
I don't notce any attempted return traffic from the same Ip's. So I am
at a loss as to what this could be or why. Any thoughts would be appreciated.

- Destination:204.152.43.107,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:27 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:130.244.175.141,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:27 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.139.21.27,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:27 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.232.175.9,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:27 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:204.152.43.107,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:27 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:130.244.175.141,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:27 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.139.21.27,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:27 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.232.175.9,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:27 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:204.152.43.107,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:27 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:130.244.175.141,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:27 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.139.21.27,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:27 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.232.175.9,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:27 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:204.152.43.107,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:27 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:130.244.175.141,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:27 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.139.21.27,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:27 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.232.175.9,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:28 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:151.164.243.2,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:29 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:151.164.243.2,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 01:10:30 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:151.164.243.2,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:204.152.167.33,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:130.244.203.3,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.139.215.213,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.232.11.117,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:204.152.167.33,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:130.244.203.3,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.139.215.213,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.232.11.117,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:204.152.167.33,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:130.244.203.3,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.139.215.213,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.232.11.117,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:204.152.167.33,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:130.244.203.3,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.139.215.213,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:02 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:202.232.11.117,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:04 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:151.164.240.134,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:05 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:151.164.240.134,[Type:8],WAN [Forward] - [Outbound Default rule
match]
Mon, 2005-10-17 16:11:06 - ICMP packet - Source:192.168.1.3,[Echo Request],LAN
- Destination:151.164.240.134,[Type:8],WAN [Forward] - [Outbound Default rule
match]

Its a dell computer supplied to me by my employer that I use for 2
purposes. I installed firefox for keeping an eye on our MRTG graphs and to use
SKYPE as required by them. Except for a Java developers toolkit that I had
to install to help with development testing once, (which I uninstalled) and
the myriad of crap software that came free with the computer, that is all that
has ever been installed that I am aware of.

I did recently, becouse of this, install startup cop to see what might
be starting up on bootups. (at least non services wize) I uninstalled a few
other items that I knew I would never used and disabled those that looked odd.
But still.. the pings go out about once or twice a day.

I have not had a chance to turn off skype to see if that is what is
doing it. But it is in no way connected to talking via skype. But why on earth
would skpe send out these random pings all over the globe?

Thanks!

  Nicole

Doesn't look very odd to me, it's a very specific, rotating sequence..

204.152.43.107
130.244.175.141
202.139.21.27
202.232.175.9
Then back to .107,.141,.27,.9 ad nauseum..

Something on your laptop is trying to a. call home, b. find the best way
home, c. who knows..

Kill processes until it goes away and then go.. aha!

Peter Kranz
pkranz@unwiredltd.com

the four IPs...

  in the mcsp.com domain
         swip.net domain
  the machine dsmmr.shoalhaven.net.au.
  in the iij.ad.jp domain

--- some process is trying to call home... part of a botnet perhaps.

--bill

I've found this tool to be very handy in finding out just what process
is doing what.

http://www.sysinternals.com/Utilities/TcpView.html

btw, I don't think nanog is the most appropriate list for these types
of questions, fyi.

I've found this tool to be very handy in finding out just what process
is doing what.

http://www.sysinternals.com/Utilities/TcpView.html

But Tcpview doesn't show anything for icmp - which is what was happening
in this case. However, if the "guilty" process is also using tcp, Tcpview
will likely identify it.

On the other hand, a firewall that limits outbound traffic to only
"permitted" programs would probably nail the program involved (Zonealarm
is one example of such a firewall).

btw, I don't think nanog is the most appropriate list for these types
of questions, fyi.

Probably so. The newsgroup news:comp.security.misc might be a better
place.

Tony Rall

Tony Li wrote:

Andre,

capacity = prefix * path * churnfactor / second

capacity = prefixes * packets / second

I think it is safe, even with projected AS and IP uptake, to assume
Moore's law can cope with this.

This one is much harder to cope with as the number of prefixes and
the link speeds are rising. Thus the problem is multiplicative to
quadratic.

You'll note that the number of prefixes is key to both of your equations. If the number of prefixes exceeds Moore's law, then it will be very difficult to get either of your equations to remain under Moore's law on the left hand side.

That's the whole point of the discussion.

Let me rephrase my statement so we aren't talking past each other.

The control plane (BGP) scales pretty much linearly (as Richard has observed
too) with the number of prefixes. It is unlikely that the growth in prefixes
and prefix churn manages to exceed the increase in readily available control
plane CPU power. For example a little VIA C3-800MHz can easily handle 10
current full feeds running OpenBDPd (for which I have done the internal data
structures design). Guess what a $500 AMD Opteron or Intel P4 can handle.
In addition BGP lends itself relatively well to scaling on SMP. So upcoming
dual- or multicore CPU's help to keep at least pace with prefix growth.
Conclusion: There is not much risk on the control plane running BGP even with
high prefix growth.

On the other hand the forwarding plane doesn't have the same scaling properties.
It faces not one but two raising factors. The number of prefixes (after cooking)
and the number of lookups per second (equal pps) as defined by link speed.
Here a 10-fold increase in prefixes and a 10-fold increase in lookups/second
may well exceed the advances in chip design and manufactoring capabilities.
A 10-fold increase in prefixes means you have search 10 times as many prefixes
(however optimized that is) and a 10-fold increase in link speed means you have
only 1/10 the time for search you had before. There are many optimization
thinkable to solve each of these. Some scale better in terms of price/performance,
others dont.

My last remark in the original mail meant that the scaling properties of
longest-match in hardware are less good than for perfect matching. My
personal conclusion is that we may have to move the DFZ routing to some
sort of fixed sized (32bit for example) identifier on which the forwarding
plane can do perfect matching. This is not unlike the rationale behind
MPLS. However here we need something that administratively and politically
works inter-AS like prefix+BGP today. Maybe the new 32bit AS number may
serve as such a perfect match routing identifier. That'd make up to 4 billion
possible entries in the DFZ routing system. Or about 16k at todays size of
the DFZ. One AS == one routing policy.

the rationale behind MPLS. However here we need something that administratively and politically works inter-AS like prefix+BGP today. Maybe the new 32bit AS number may serve as such a perfect match routing identifier.

Interesting idea.

That'd make up to 4 billion possible entries in the DFZ routing system. Or about 16k at todays size of the DFZ. One AS == one routing policy.

That means though that we still need a way for people without an ASN to multi-home. Because clearly the number of ASNs is quite restricted compared to the number of IPv6 prefixes. So:

- we need to change that 4-byte AS draft to (4+X)-byte ASNs
   sharpish, X should be 4 probably (good luck with that). And change
   all IPv6 stacks in routers (and hosts, but that's easier).

OR

- we also need $AREA allocated IPs (which obviously operators would
   love to work on implementing)

OR

- we still will have some end-host "probe with every source address"
   and "change every stack" solution, one which adds a sort of
   supra-net to the internet which is only visible to end-hosts with
   this stack.

Seems to me at least, pre-coffee.

regards,

Daniel,

I think it is safe, even with projected AS and IP uptake, to assume
Moore's law can cope with this.

Moore will keep up reasonably with both the CPU needed to keep BGP perking, and with memory requirements for the RIB, as well as other non-data-path functions of routers.

That's only true if the rate of prefix addition can be constrained to be below the rate dictated by Moore's law, which is the entire point of the discussion. There is nothing today that acts as a pressure against bloat except portions of the net periodically blowing up as their hardware capacity is exceeded.

Several items regarding FIB lookup:

1) The design of the FIB need not be the same as the RIB. There is plenty of room for creativity in router design in this space. Specifically, the FIB could be dramatically reduced in size via aggregation. The number of egress points (real or virtual) and/or policies within a router is likely FAR smaller than the total number of routes. It's unclear if any significant effort has been put into this.

In fact, there has been. In a previous life, we actually had some FIB pre-processing that did a great deal of aggregation prior to pushing the FIB down into hardware. We found that it was workable, but consumed extensive CPU resources to keep up with the churn. Thus, this turns out to be a tool to push the problem from the FIB back up to the CPU. Previous comments still apply, and this just increases the CPU burn rate.

2) Nothing says the design of the FIB lookup hardware has to be longest match. Other designs are quite possible.

Longest match is fundamental in the workings of all of the classless protocols today. Changing this means changing almost every protocol.

3) Don't discount novel uses of commodity components. There are fast CPU chips available today that may be appropriate to embed on line cards with a bit of firmware, and may be a lot more cost effective and sufficiently fast compared to custom ASICs of a few years ago. The definition of what's hardware and what's software on line cards need not be entirely defined by whether the design is executed entirely by a hardware engineer or a software engineer.

This has been tried as well.

Tony

One question - which percent of routing table of any particular router is
REALLY used, say, during 1 week?

I have a strong impression, that answer wil not be more than 20% even in
biggerst backbones, and
will be (more likely) below 1% in the rest of the world. Which makes a hige
space for optimization.