Software router state of the art

Hi all!

There's been some discussion on the list regarding software routers lately and this piqued my interest. Does anybody have any recent performance and capability statistics (eg. forwarding rates with full BGP tables and N ethernet interfaces) or any pointer to what the current state of art in software routers is?

- Zed

The performance of "software routers" has always had a hardware component.

Basically, for the vast majority of them, take your PCI bus bandwidth,
count how many times a packet has to cross it, and do the math. You can't
forward more than that much traffic no matter *what* software you run on
that box. If that number falls short, stop right there and look for
some box of different design that has the required backplane bandwidth.

You will, of course, take additional performance hits due to locking issues
and similar in your software stack (that, and most "software" routers will
suffer from not having special hardware assist for routing table lookups).

Let us know if you find a suitable chassis/motherboard that has enough
bandwidth to make it worth thinking about for anything other than the
smaller edge routers that most providers have zillions of... :slight_smile:

This might be of interest:

http://nrg.cs.ucl.ac.uk/mjh/tmp/vrouter-perf.pdf

The Endace DAG cards claim they can move 7 gbps over a PCI-X bus from
the NIC to main DRAM. They claim a full 10gbps on a PCIE bus.

Regards,
Bill Herrin

Various FreeBSD related guys are working on parallelising the forwarding
layer enough to use the multiple tx/rx queues in some chipsets such as the
Intel gig/10ge stuff.

1 mil pps has been broken that way, but it uses lots of cores to get there.
(8, I think?)

Linux apparently is/has headed down this path.

If someone were to spend some time dissecting the rest of the code to
also optimise the single-core throughput then you may see some interesting
software routers using commodity hardware (for values of "commodity"
roughly equal to "PC servers", rather than "magic lotsacore core MIPS with
some extra glue for jacking packets around."

Sure its not a CRS-1, but reliably doing a mil pps with a smattering of
low-touch features would be rather useful, no?

(Then, add say, l2tp/ppp into that mix, just as a crazy on-topic example..)

Adrian

Adrian Chadd wrote:

1 mil pps has been broken that way, but it uses lots of cores to get there.
(8, I think?)

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/net/2008-06/msg00364.html has all the details. It's rather long thread but 1mpps was achieved on a single cpu IIRC (the server had multiple cpus but only one being used for forwarding). Firewall rules slowed it down quite a bit but theres also some work out there being done to minimize this.

Regards,

  Chris

Yah, all of that is happening. Some people keep asking why FreeBSD-4
forwarding was always much faster than same-hardware forwarding under
current FreeBSD but at least thats finally being worked on.

Of course, with my FreeBSD advocacy hat on, if you -want- to see
something like FreeBSD handle 1mil+ pps forwarding then you should
really drop the FreeBSD Foundation a line and introduce yourself.
There are developers working on this (note: not me! :slight_smile: who would
benefit from equipment and funding.

Anyway. Some PC class hardware is pretty damned fast. Some vendors
even build highish-throughput firewalls and proxies out of PC class
hardware. :slight_smile: The "wah wah PC class hardware has anemic bus IO/memory IO/
CPU speed/ethernet modules and is thus too crap for serious routing" argument
is pretty much over for at least 1 mil pps, perhaps more.

2c,

Adrian

There's been some discussion on the list regarding software routers

The performance of "software routers" has always had a hardware component.

Basically, for the vast majority of them, take your PCI bus bandwidth,
count how many times a packet has to cross it, and do the math. You can't
forward more than that much traffic no matter *what* software you run on
that box. If that number falls short, stop right there and look for
some box of different design that has the required backplane bandwidth.

You will, of course, take additional performance hits due to locking issues
and similar in your software stack (that, and most "software" routers will
suffer from not having special hardware assist for routing table lookups).

The current state of the art is around 2 million pps for fast intel arch system.

That is a very interesting paper. Seriously, 7mpps with an
off-the-shelf Dell 2950? Even if it were -half- that throughput, for a
pure ethernet forwarding solution that is incredible. Shoot, buy a
handful of them as hot spares and still save a bundle.

Highly recommended reading, even if (like me) you're anti-commodity routing.

Cheers,
Randal

Adrian Chadd wrote:

The Endace DAG cards claim they can move 7 gbps over a PCI-X bus from
the NIC to main DRAM. They claim a full 10gbps on a PCIE bus.

I wonder, has anyone heard of this used for IDS? I've been looking at
building a commodity SNORT solution, and wondering if a powerful network
card will help, or would the bottleneck be in processing the packets and
overhead from the OS?

- naveen

Once upon a time, Adam Armstrong <lists@memetic.org> said:

Sounds like a Juniper J-series. Have a look at the forwarding figures
for the J6350. It does something around 2mpps and it's just an intel CPU
with some PCI/PCI-X interfaces. The device just below it, the J4350 uses
a 2.53Ghz celeron. I'm not sure what the J6350 uses.

IIRC the new slots (the EPIMs) are PCI-E. The J6350 CPU is a P4 3.4GHz.
It is my understanding that the J-series use a real-time layer under the
FreeBSD kernel and have a real-time thread for forwarding (as opposed to
the M-series with a hardware forwarding engine).

The first bottleneck is the interrupts from the NIC. With a generic
Intel NIC under Linux, you start to lose a non-trivial number of
packets around 700mbps of "normal" traffic because it can't service
the interrupts quickly enough.

The DAG card can be dropped in to replace the interface used for a
libpcap-based application. When I tested the 1gbps PCIE version, I
lost no packets to 1gbps and my capture application's CPU usage
dropped to about 1/5th of what it was with the generic NIC. YMMV.

Regards,
Bill Herrin

http://www.endace.com/our-products/ninja-appliances/NinjaProbe-NIDS

snort at 1g & 10g

-chris

We use them here and there (the 1Gig versions). The biggest thing to think about is the types of rule-sets you'll be using compounded by the number of flows being created / expired. Once tuned, they work quite well, but the balance is how fast you can pull/analyze out of RAM. Compiling the rules down to the card's level speeds things up a bit, but at the loss of using more dynamic rulesets.

If you can get the raw data to some sort of larger medium (say, rotating pcaps on a disk), you length the buffer-window. FWIW however, probably the best way to scale this is get an Xport fiber regen tap, populate with a few of these, tune them to monitor different segments based on address space or port ranges. You'll have yourself a relatively cheap solution, but extremely effective solution.

I've yet to test out the NinjaProbes... It's on my todo list...

I've seen some notes from Dave Miller about adding multiple TX queues
to the 2.6.27 kernel. See Dave's blog for the gory details:

http://vger.kernel.org/~davem/cgi-bin/blog.cgi
http://git.kernel.org/?p=linux/kernel/git/davem/net-tx-2.6.git;a=summary

AFAIK he hasn't made any claims about performance improvements. I
don't know the state of RX queues in Linux.

Jeff

Is anyone using Vyatta for routing? I sure would like to know about any experience with it in production.

http://www.vyatta.com/

Yes. We put in some Vyatta routers to extend our corporate network into another building as a temporary solution (the building had a very short lease, so our boss didn't want to spend any money on Juniper which is our usual net gear vendor). Consequently, we are still there.. go figure.

When we started w/ them, they were still using the XORP routing engine (and we haven't upgraded to the new platform yet). My experience wasn't terribly good. The first issue was a bad memory leak in the router manager process when VRRP hello times were set to 1 second. The first indication of something wrong is that our master router crashed, followed by his backup. Had to physically reboot the boxes to get them back online, which involved driving there as no one onsite had access to the cage at the office. All voice and data ran through these routers, basically rendering every employee useless until we got it back online. It wasn't a happy day. After that we had to monitor memory and do controlled reboots every month or so. We eventually convinced Vyatta of this memory leak and they were able to fix it, but that was a very frustrating process, and time consuming for us, which is why the next problems I describe, we have just found our own workarounds.

The next problem was a combination of a problem with the Vyatta and a problem w/ our IP phones. The Vyatta was sending garp's for the data vrrp address out the voice vlan (same 2 routers are default gate on both data and voice vlans). All of the workstations run through the phones (which sit tagged on voice vlan, and pass traffic from workstation untagged to data vlan). The phone, seeing the arp for the data vrrp address on its voice vlan, would send traffic to that address out the voice vlan, effectively taking that workstation off the net for anything other than local traffic. That was a bugger to figure out, and basically we solved it with an arptables rule on the vyatta's. That was the one advantage of using a Linux (debian) based router platform, was that we could load other 'unsupported' packages to solve problems like this.

The last thing is that OSPF never really converges correctly. You can view the OSPF database, and see which default the routers should converge to, but they do not. They will sit converged to one path for a while, and if you reboot the other router that generates default, they will reconverge to it for a while. This hasn't been a big enough problem for me to worry about it.

Last thing to say is, I haven't tried upgrading since Vyatta abandoned the XORP platform and moved to the Quagga platform, but I'm guessing (based on experience w/ Quagga) that they have a lot fewer of these quirks that I've described.

IMHO, YMMV, etc

--Justin

Tim Sanderson wrote:

* William Herrin:

The Endace DAG cards claim they can move 7 gbps over a PCI-X bus from
the NIC to main DRAM. They claim a full 10gbps on a PCIE bus.

But they are receive-only, right?

The main problem for "software routing" seems to be that it's basically
Ethernet-only because other interfaces are very difficult to find.

* Adrian Chadd:

1 mil pps has been broken that way, but it uses lots of cores to get there.
(8, I think?)

Was this with one packet flow, or with millions of them?

Traditionally, software routing performance on hosts systems has been
optimized for few and rather long flows.

Anyway, with multi-core, you don't need funky algorithms for incremental
FIB updates anymore (if you don't need sub-second convergence and stuff
like that). As a result, you can use really dumb multi-way trees for
which a lookup takes something like 100 CPU cycles (significantly less
for non-DoS traffic with higher locality).