I have a feeling this one may start another very large NANOG thread:
http://www.nwfusion.com/news/2001/0402routing.html
-Hank
I have a feeling this one may start another very large NANOG thread:
http://www.nwfusion.com/news/2001/0402routing.html
-Hank
Not to oversimplify, but assuming we can continue to separate forwarding
from the routing process itself, is this really a situation that calls for
a complete redesign of BGP? If you look at the routing processors on
Cisco and Juniper hardware, Cisco's GSR is using a 200Mhz MIPS RISC
processor and Juniper is using a 333Mhz Mobile Pentium II.
With RISC reaching 1Ghz and Intel pushing 2Ghz, it appears that the actual
processors in use by the 2 big vendors are a couple of years behind. What
happens to the boxes ability to process a 500,000 route table if you
quadruple it's memory and give it 5 times more processing power?
Also, it would likely require a re-write of software, but what's keeping
us from using SMP in routers?
Cheers.
-travis
Performance of a routing protocol is not a function of just
the CPU avaliable.
Performance of a routing protocol is a function of the CPU
avaliable and the network characteristics.
*shakes head* people keep forgetting this. Do you guys also
think you can solve the internets problems by adding more bandwidth?
Adrian
adrian,
to take your point one step further.
the architecture of the router and the mechanism by which it forwards
packets differs for various vendors. to simply state "CPU" is a non-sense
when it comes to routers. You need to be more specific and look into the
scaling effects on scheduler performance, switching fabric performance and
architecture, Buffering, forwarding design ( centralised or distributed ),
and ASIC development. Remeber Moores law applies to ASICs and their
"widget" density.
my 2c worths
>
> Not to oversimplify, but assuming we can continue to separate forwarding
> from the routing process itself, is this really a situation that calls for
> a complete redesign of BGP? If you look at the routing processors on
> Cisco and Juniper hardware, Cisco's GSR is using a 200Mhz MIPS RISC
> processor and Juniper is using a 333Mhz Mobile Pentium II.
>
> With RISC reaching 1Ghz and Intel pushing 2Ghz, it appears that the actual
> processors in use by the 2 big vendors are a couple of years behind. What
> happens to the boxes ability to process a 500,000 route table if you
> quadruple it's memory and give it 5 times more processing power?
>
> Also, it would likely require a re-write of software, but what's keeping
> us from using SMP in routers?Performance of a routing protocol is not a function of just
the CPU avaliable.Performance of a routing protocol is a function of the CPU
avaliable and the network characteristics.*shakes head* people keep forgetting this. Do you guys also
think you can solve the internets problems by adding more bandwidth?
I think the current large routers can handle flapping (50,000 routes every 30 seconds):
http://www.lightreading.com/document.asp?site=testing&doc_id=4009&page_number=12
and they can handle large BGP tables (Cisco: 400K, Juniper: 2.4M):
http://www.lightreading.com/document.asp?site=testing&doc_id=4009&page_number=10
The problem is all the legacy Cisco 7500s in the core that are defaultless and currently carry 99,000 routes. I think Geoff is wrong in his statement that the problem is not routing table size, but rather flapping. To quote Geoff: "It's not the size of the table, but the number of updates per second that kills a router stone dead." But the rate of flapping is proportional to the size of the routing table, IMO. If you have 1000 routes in your table, and on average 5% of the nets will flap every 60 seconds, that comes to 50. If you table is 100,000 and the same 5% will flap, that comes to 5000 every minute. Reduce the table size and you *will* affect the flapping as well.
-Hank
I think the current large routers can handle flapping (50,000 routes every
30 seconds):
http://www.lightreading.com/document.asp?site=testing&doc_id=4009&page_number=12
and they can handle large BGP tables (Cisco: 400K, Juniper: 2.4M):
http://www.lightreading.com/document.asp?site=testing&doc_id=4009&page_number=10
How many routers did they test?
Did they test 2 routers? Or did they test 1000 routers?
Did they plot just the BGP table withdrawl speed and the
subsequent BGP table repopulation speed? What about
doing some quick modelling on what affect this flapping
"latency" could do to a large mesh of routers.
There has been some work done on this. Its been covered
at NANOG. The reason that most of its effects on
reachability are masked by super-routes.
(which for most of you will be the default route.
I'd love to see one day when every network running a full BGP
table pulled out its default route(s) and ran defaultless.
The problem is all the legacy Cisco 7500s in the core that are defaultless
and currently carry 99,000 routes. I think Geoff is wrong in his statement
that the problem is not routing table size, but rather flapping. To quote
Geoff: "It's not the size of the table, but the number of updates per
second that kills a router stone dead." But the rate of flapping is
proportional to the size of the routing table, IMO. If you have 1000
routes in your table, and on average 5% of the nets will flap every 60
seconds, that comes to 50. If you table is 100,000 and the same 5% will
flap, that comes to 5000 every minute. Reduce the table size and you
*will* affect the flapping as well.
Even if every router in the internet core was upgraded to the
latest and greatest 4-way SMP 2ghz intel CPUs running the routing
protocols with 4 gigabytes of RAM each, the sheer complexity of
the routing system would produce some rather interesting dynamics.
Hell, even if you threwq this at 100,000 routes in today's network
topology, I'm pretty sure the nature of BGP would be a little different
(Read: Just because its faster, doesn't mean its better. Sometimes
something being slow acts as a regulator. People might want to try
grabbing some basic CS programs to do network modelling and start
playing.
I'll stop ranting now, since I've already ranted on this topic
before.
Adrian
(NOTE: People are probably thinking that I'm just speaking out of my ass.
Being a hardware geek, software programmer and routing person has
got its advantages. One of them is that I have an insatiable desire
to digest any reading I can to figure out how things work, and
I currently do this for networking since my current job hat
has "programmer" on it.)
>
>
> Not to oversimplify, but assuming we can continue to separate forwarding
> from the routing process itself, is this really a situation that calls for
> a complete redesign of BGP? If you look at the routing processors on
> Cisco and Juniper hardware, Cisco's GSR is using a 200Mhz MIPS RISC
> processor and Juniper is using a 333Mhz Mobile Pentium II.
>
> With RISC reaching 1Ghz and Intel pushing 2Ghz, it appears that the actual
> processors in use by the 2 big vendors are a couple of years behind. What
> happens to the boxes ability to process a 500,000 route table if you
> quadruple it's memory and give it 5 times more processing power?
>
> Also, it would likely require a re-write of software, but what's keeping
> us from using SMP in routers?Performance of a routing protocol is not a function of just
the CPU avaliable.Performance of a routing protocol is a function of the CPU
avaliable and the network characteristics.
Granted, but are you saying that the 15 minutes it takes one of my BGP
sessions to reload has no relevance to the crusty, old processor doing
route calculations on 104,000 routes? Multiply the CPU available by 5,
and then look for bottlenecks. Seems sane to me. Also seems a hell of a
lot easier than trying to redesign the network characteristics in 1 year
and implement them ... IPv6 anyone?
-travis
It might not, it might be more of a function of the CPU on the other end.
It might be more limited by the bandwidth.
Even if it's totally CPU limited on your end: multiply your CPU by 5
and you have gained somewhat less then a factor of 5 improvement. Replace
the internet with a highly aggregated IPv6 network which uses transport
level multihoming and you gain a factor of 1000 improvement at core
routers (and 100,000x further from the core where you no longer need to
be default-free) and still have the oppturnity for a further 5x by going
to a state-of-the-art CPU (providing that your cpu speed reasoning is
valid).
Incremental performance improvements in router performance is a good
thing, but it's no where near the level needed to ensure sustainability.
If going to a 5x faster CPU would really help real-world performance that
much, the high-end router vendors would have already done it at the prices
they charge they can afford to be bleeding-edge cpu wise.
Easier yes, in the short term, but after you've implimented your state of
the art CPU to scale any further you need to invent working quantum
computers and install seperate OC3s to carry routing updates to continue
scaling. When you consider that, IPv6 doesn't sound bad after all.
If you look at the graphs though, the routing table growth stopped around the end of the year. I've seen 101k prefixes give or take a few hundred in my view since 31st December.... Also, the number of /24s being announced has stopped growing. It's been 58k5 for the last 3 months...
Routing table growth is following the state of the Internet economies? Looks like it to me.
philip
Absolutely - if you look at the strong US dollar and make the observation that connectivity prices for Internet are largely driven in USD currency, then the relatively stronger USD makes connectivity more expensive in other economies - this in turn damps down growth as new non-US markets which are dependant on exposure to lower unit prices remain unexposed until the comms price declines once more.
hmm - maybe we can use the first derivative of this BGP table metric as a global economy indicator
It may also be the concerted effort of certain individuals who have been after the top 10-30 non-CIDRers. I know I have sent out dozens of emails to those ASes since Jan 1 and have gotten, in general, a positive response and many have fixed their systems.
-Hank
I'd imagine that efforts to clean up the table would result in step function changes, rather than the abrupt stop to the increase we have been seeing. It's a little different from 1994 when we all had to CIDRise "or die"... (I'll look at it a bit more closely - the daily delta should point to whether it is a cleanup, or economic slowdown...)
philip