Mikrotik Cloud Core Router and BGP real life experiences?

Hi,

looking at the specs of Mikrotik Cloud Core Routers it seems to be to good to be true [1] having so much bang for the bucks. So virtually all smaller ISPs would drop their CISCO gear for Mikrotik Routerboards.

We are using a handful of Mikrotik boxes, but on a much lower network level (splitting networks; low end router behind ADSL modem, ...). We're happy with them.

So I am asking for real life experience and not lab values with Mikrotik Cloud Core Routers and BGP. How good can they handle full tables and a bunch of peering sessions? How good does the box react when adding filters (during attacks)? Reloading the table? etc. etc.

I am looking for _real_ _life_ values compared to a CISCO NPE-G2. Please tell me/us from your first hand experience.

Thanks!

greetings, Martin

[1] If something sounds too good to be true, it probably is.

I am going to be deploying 4 as edge routers in the next few weeks, each will have 1 or 2 full tables plus partial IX tables. So I should have some empirical info soon.

They will be doing eBGP to upstreams and iBGP/OSPF internally. I went with the 16gb RAM models.

However these boxes are basically Linux running on top of tilera CPUs, in terms of throughput as long as everything stays on the fastpath they have no issues doing wire speed on all ports, however the moment you add a firewall rule or the like they drop to 1.5gbps.

Thanks,

estimated traffic levels are at about half a gig, but at least 50 megs of UDP (VoIP) in both directions.

one thing is that I haven't found a solution for redundant power supply.

#m

Thanks,

estimated traffic levels are at about half a gig, but at least 50 megs of UDP (VoIP) in both directions.

one thing is that I haven't found a solution for redundant power supply.

Buy 2 :slight_smile:

on 3am I only want to read the notification and know what to do first in the morning. And not jump out and bring the spare into production.

#m

You set them both up configure the spare for fail-over.

My real world experience with these is that they suck. Plain and simple.
Don't waste your time.

My real world experience with these is that they suck. Plain and simple.
Don't waste your time.

Would you mind elaborating what you were trying to accomplish and what
failed?

Thank you.

Ray

People who tested say they don't forward more than 500Mbps per port.

I too am curious...

We've used them for a few months as edge devices and most (if not all) *knock on wood* of the issues we've had have been fixed by RouterOS updates, configuration changes (lots of chefs in the kitchen), or were circuit/carrier related.

While I would never compare them apples-to-apples to Cisco, Juniper, etc devices... they have, in our experience, proven to be good inexpensive routers with a few quirks here and there.

Guess I should chime in here. As far as the CCR, I know several customers running in excess of 1 gig of traffic though them, one has 16 BGP sessions, several of those are full tables, and the rest are on an peering exchange. There are other units, like the ones we supply, that does more than 20 gig in real word usages. They are very capable devices, but depending on how many features you enable, of course that will affect their overall abilities. This would be real word, and yes, I work with 1000's of ISPs across North America, many between 100-10gig of traffic, cable companies, DSL providers, and WISPs, and many of these ONLY use MikroTik.

As another person said, grab two and configure so that you split your load up, we have done that in areas where redundancy is important. Seeing the Dual 10GigE model with 8 GigE ports costs $1,249 or so, hard to beat them in price, and add two or more to get your redundancy.

Dennis Burgess, Mikrotik Certified Trainer Author of "Learn RouterOS- Second Edition"
Link Technologies, Inc -- Mikrotik & WISP Support Services
Office: 314-735-0270 Website: http://www.linktechs.net - Skype: linktechs
-- Create Wireless Coverage's with www.towercoverage.com - 900Mhz - LTE - 3G - 3.65 - TV Whitespace

They can not handle a full routing table. The load balancing doesn't work.
They can not properly reassemble fragmented packets, and therefore drop all
but the first "piece". They can not reliably handle traffic loads over
maybe 200 Mbps, we needed 4-6 Gbps capacity. They can not hold a gre tunnel
connection.

We have many with full routing tables. Load balancing, works fine, I have one site with 8 DSL lines doing balancing across them. We typically don't use a GRE tunnel, but OpenVPN or IPSEC work great.

Dennis Burgess, Mikrotik Certified Trainer Author of "Learn RouterOS- Second Edition"
Link Technologies, Inc -- Mikrotik & WISP Support Services
Office: 314-735-0270 Website: http://www.linktechs.net - Skype: linktechs
-- Create Wireless Coverage's with www.towercoverage.com - 900Mhz - LTE - 3G - 3.65 - TV Whitespace

Out of all the network hardware I have worked on in operations these were
by far some of the worst. I read lots of good things but like most things
in life these just dont stack up against a Cisco or Juniper for stability
and reliability. Most of the ISP's I have worked with were HSD but i also
followed the progression path in the industry so i have time with Dial Up,
ADSL/X/...,WISP's, Data Centers etc. and FTTH

I generally only see these in WISP's and some DSL installs. Never anything
with huge traffic load and full tables. Generally always driven by the cost
factor alone without regard to much else imho. But that's just my
experience. However maybe there are people that have managed to keep these
up and handle all you have requested.

just my 2c

FYI... Mikrotik Cloud Core routers are nice, however one has to keep something in mind when deploying them...

Only One Core (of the CPU) is dedicated to each port / process.
So this is good so as to contain what happens on a single port from taxing the whole CPU..
But not so good when you need more cpu power than a single core for that port.

Also, BGP process will only use one core.

While these units make for great 'customer facing' edge routers, with plenty of power and the ability to keep issues contained... The X-86 based (Core2Duo/i5/i7) Mikrotik are more suitable (Processing power wise) for running multiple full BGP tables peering.

Regards & Good Luck.

Faisal Imtiaz
Snappy Internet & Telecom

Can't say anything about MicroTik specifically, but I've used Linux as a routing platform for many years, off and on, and took a reasonably close look at performance about a year ago, in the previous job, using relatively high-end, but pre-Sandy Bridge, generic hardware. We were looking to support ca. 8 x 10 GbE ports with several full tables, and the usual suspects wanted the usual 6-figure amounts for boxes that could do that (the issue being the full routes -- 8 x 10 GbE with minimal routing is a triviality these days).

Routing table size was completely not an issue in our environment; we were looking at a number of concurrent flows in the high-5 to low-6-digit range, and since Linux uses a route cache, it was that number, rather than the number of full tables we carried, that was important. Doing store-and-forward packet processing, as opposed to cut-through switching, took about 5 microseconds per packet, and consumed about that much CPU time. The added latency was not an issue for us; but at 5 us, that's 200Kpps per CPU. With 1500-byte packets, that's about 2.4 Gb/s total throughput; but with 40-byte packets, it's only 64 Mb/s (!).

But that's per CPU. Our box had 24 CPUs (if you count a hyperthreaded pair as 2), and this work is eminently parallelizable. So a theoretical upper bound on throughput with this box would have been 4.8 Mpps -- 57.6 Gb/s with 1500-byte packets, 1.5 Gb/s with 40-byte packets.

The Linux network stack (plus RSS on the NICs) seemed to do quite a good job of input-side parallelism - but we saw a lot of lock contention on the output side. At that point, we abandoned the project, as it was incidental to the organization's mission. I think that with a little more work, we could have gotten within, say, a factor of 2 of the limits above, which would have been good enough for us (though surely not for everybody). Incrementally faster hardware would have incrementally better performance.

OpenFlow, which marries cheap, fast, and dumb ASICs with cheap, slower, and infinitely flexible generic CPU and RAM, seemed, and still seems, like the clearly right approach. At the time, it didn't seem ready for prime time, either in the selection of OpenFlow-capable routers or in the software stack. I imagine there's been some progress made since. Whether the market will allow it to flourish is another question.

Below a certain maximum throughput, routing with generic boxes is actually pretty easy. Today, I'd say that maximum is roughly in the low-single-gigabit range. Higher is possible, but gets progressively harder to get right (and it's not a firm bound, anyway, as it depends on traffic mix and other requirements). Whether it's worth doing really depends on your goals and skill. Most people will probably prefer a canned solution from a vendor. People who grow and eat their own food surely eat better, and more cheaply, than those who buy at the supermarket; but it's not for everybody.

Jim Shankland

The issues I see are because of routers versions. The Cloud core routers
are a fairly new platform. As such, the software isn¹t as stable as it
should be. The OS is up to version 6.7. There were some betas before 6.0
was released. However, almost every version that has been released
addresses issues with the cloud core. The cloud cores only run Version 6.

  We did se BGP issues early on accepting more than one full routing table.
We saw other issues but they were fixed with subsequent OS software
releases.

  Justin

Unless my knowledge is out of date, the one thing RouterOS has that others in the same scope lack is a full MPLS stack that's not experimental.

~Seth

<snip>

Routing table size was completely not an issue in our environment; we
were looking at a number of concurrent flows in the high-5 to
low-6-digit range, and since Linux uses a route cache, it was that
number, rather than the number of full tables we carried, that was
important.

<snip>

FYI, Linux no longer has a routing cache, so any performance numbers with the cache in place is void on modern kernels. It was deemed too fragile, handled mixed traffic badly, and was way easy to DoS. It wasnt simply just ripped out of course, the full lookups was made way faster and a bunch of scalability issues got plugged in the process.

All in all, in PPS, Linux should now handle mixed traffic much better, but less diverse traffic patterns might be a little slower than before. However, all in all, much more consistent and predictable.

Not everything is peachy though, there are still some cases that sucked last I checked. Running tons of tunnels beeing one. Multicast rx was severely gimped for a while after the removal, but that got fixed.

How about SMP Affinity in CCR?
System > Resources > IRQ.