BGP Route Reflector - Route Server, Router, etc

Justin_Krejci · January 12, 2017, 8:32pm

Nanog,

I am working on some network designs and am adding some additional routers to a BGP network. I'd like to build a plan of changing all of the existing routers over from full iBGP mesh to something more scalable (ie route reflection). Fortunately, I am also going to be able to decommission some older routers from the network and so shrinking the existing iBGP full mesh is something I am all too happy to spend time and energy on.

For the purpose of this thread though, I am not really interested in the route reflector vs confederation discussion.

In doing some research[1][2][3][4][5] I see a lot of discussions, config examples, etc on using route reflectors but most suggest picking a router, or more appropriately a set of routers, to become route reflectors within an ASN. I have not found many resources discussing using a non-router box as a route reflector (ie a device not necessarily in the forwarding path of the through traffic). I am thinking things like OpenBGPd and BIRD could make a good route reflector though they are most often discussed in the context of IXPs (ie eBGP sessions).

I am wondering if people can point me in the direction to some good resource material on how to select a good BGP route reflector design. Should I just dust off some 7206VXR routers to act as route reflectors? Use a few existing live routers and just add the responsibility of being route reflectors, is there a performance hit? Install and run BIRD on new server hardware? Buy some newer purpose built routers (Cisco, Juniper, Brocade, etc) to act as route reflectors and add them to the iBGP topology? GNS3 running IOS on server hardware? Something else? How many reflectors should be implemented? Two? Four?

What are the pros and cons of one design over another? On list or private off list replies would be great; I'd welcome real world experiences (especially any big gotchas or caveats people learned the hard way) as well as just links to previous discussions, PDFs, slideshows, etc. Heck even a good book suggestion that covers this topic would be appreciated.

[1] - iBGP-to-RR migration slideshow: http://meetings.ripe.net/ripe-42/presentations/ripe42-eof-bgp/sld015.html
[2] - General RR design issues: http://www.netcraftsmen.com/bgp-route-reflector-design-issues/
[3] - Video intro to RR from Cisco: http://www.cisco.com/c/dam/en_us/training-events/le31/le46/cln/qlm/CCIP/bgp/introducing-route-reflectors-2/player.html
[4] - Quagga and BIRD as RR example: https://bsdrp.net/documentation/examples/bgp_route_reflector_and_confederation_using_quagga_and_bird
[5] - Countless hours on youtube: https://www.youtube.com/results?search_query=bgp+route+reflector

Lots more data is out there of course as that is part of my problem.

Thanks!

Justin

James_Breeden · January 12, 2017, 9:26pm

Justin,

Fusion has run Route Reflection for some time as we didn't want to play full mesh.

Our current design is that we have 2 routers that are designated as the route reflectors, and every other router maintains RR sessions with those. There is no real additional overhead as the BGP transactions and messaging are handled at the management plane of most routers, not the routing plane. Ours are directly on our Brocade MLXe.

As we are scaling past the initial build, we are running into certain minor routing hiccups dealing with remote routers sometime preferring routes from the reflectors vs their closer routes. We found this to be a function of ingesting transit and default routes directly into route reflector routers and that those routes tended to get preferred in the table than routes from other routers in the network. Our answer to this and what we are deploying this year is that we are picking a site per time zone to be a Route Reflector, which will give us 4 RRs in the states, but we will not use sites that are Transit ingest sites. This way we are more balancing the BGP table across the entire network. Also, I believe we will move to default-free status this year with this same move.

Happy to discuss more indepth offlist if you'd like.

--

James Breeden
The Fusion Network
jwb@gotfusion.net
fusion 844.548.1421
direct 512.360.0000
cell 512.304.0745
www.gotfusion.net
facebook.com/gotfusion

Lukasz_Bromirski · January 12, 2017, 10:41pm

Nanog,
[…]

You did some homework. In essence, there’s no immediate problem with running Quagga or OpenBGPd as
RR apart from lack of different knobs and not-so-stellar performance/scalability. BIRD is grounds up built
to act as high-performance BGP daemon, and it’s actually used as RR in live deployments, not only at IXes.

I am wondering if people can point me in the direction to some good resource material on how to select a good BGP route reflector design. Should I just dust off some 7206VXR routers to act as route reflectors? Use a few existing live routers and just add the responsibility of being route reflectors, is there a performance hit? Install and run BIRD on new server hardware? Buy some newer purpose built routers (Cisco, Juniper, Brocade, etc) to act as route reflectors and add them to the iBGP topology? GNS3 running IOS on server hardware? Something else? How many reflectors should be implemented? Two? Four?

Disclaimer: I work at Cisco.

If You have some 7200VXRs that have 1 or 2GBs of RAM, that may be the best option (IF you have them).
Loaded with 12.2S/15S software they may actually be the most cost-effective solution and at the same
time support things like AddPath, BGP error handling, etc - when time comes to use such features.
If that’s a NPE400 based chassis or something even older - leave it for lab/etc as You need rather
performant CPU.

So, if that’s not the option, try to work with the BIRD, CSR 1000v (IOS-XE on VM) or ASR 1001X/HX
(currently, the most scaleable and fastest BGP route reflector out there, but one that will cost $$$).

Two RRs provide ample redundancy to run even very large deployments (1000+ clients), so unless you’re
trying to hit higher numbers or plan to play fancy games with one pair of RRs for IPv4/IPv6 unicast
and other pair for different AFs, four may be an overkill to maintain, synchronize and monitor.

Don’t go with GNS3, running compiled at runtime emulation is wrong idea for any production deployment,
not to mention rights/licenses to do it.

bengelly · January 12, 2017, 10:55pm

Dear Justin,

You could take a look at this presentation from Mark Tinka during last NANOG :

https://m.youtube.com/watch?v=wLEjOj2fyp8

HTH.

Y.

James_Bensley1 · January 12, 2017, 10:59pm

The CSR1000v (IOS-XE),IOS-XRv and vMX are production ready. People are
deploying these in production and its increasing in popularity.

Mark Tinka gave a good preso at a recent Nanog:
https://www.nanog.org/sites/default/files/2_Tinka_21st_Century_iBGP_Route_Reflection.pdf
https://www.youtube.com/watch?v=wLEjOj2fyp8&list=PLO8DR5ZGla8hcpeEDSBNPE5OrZf70iXZg&index=21

Cheers,
James.

Mike_Hammett · January 12, 2017, 10:59pm

Your knowledge of OpenBGPd's scalability issues may be a bit dated.

1) I'm not sure many would have run into it anyway.
2) A patch was submitted and I believe is in a stable release now.

Emille_Blanc · January 13, 2017, 12:25am

I am thinking things like OpenBGPd and BIRD could make a good route reflector though they are most often discussed in the context of IXPs (ie eBGP sessions).

We use openbgpd - well, the native OpenBSD equivalent - for route-reflection in a couple of places, as well as a full bgp feed for at least one site, using (old) Poweredge 1950 Gen2's. They were on-hand, so the price was right.
It's not caused us any grief to date. That said, neither have our 7204VXR's which do the same thing in some areas.
Needless to say, we don't use the reflectors to actually move the bits, but have at least on one occasion measured ~88,000pp/s out of one of the 1950's that takes a full feed, before interrupts were starting to look worrisome on old non-smp safe code.
But switches with bgp or ospf support are cheap provided you're not feeding them with a full table.
Convergence times haven't been a problem for us, but we're only hovering around 1500 routes at the moment.

Having something you can tcpdump on is nice for the few situations that call for it, pf is always extremely handy, re-distributing to/from ospfd is trivial (also in OpenBSD base).

As long as you can find hardware with memory enough to scale to your number of routes, it's been a perfectly valid and sound option for us.

My 5 cents.

Hugo_Slabbert1 · January 13, 2017, 4:02am

. I have not found many resources discussing using a non-router box as a route reflector (ie a device not necessarily in the forwarding path of the through traffic). I am thinking things like OpenBGPd and BIRD could make a good route reflector though they are most often discussed in the context of IXPs (ie eBGP sessions).

The CSR1000v (IOS-XE),IOS-XRv and vMX are production ready. People are
deploying these in production and its increasing in popularity.

Any thoughts on vRR vs. vMX for this use case? I see Mark called out vRR as having morphed into vMX, but AFAIK vRR is just vMX minus the forwarding plane, is targeted as an out-of-path reflector, and coexists with vMX as a different deployment option rather than having been replaced by it. I would assume that vRR should come in a few bucks lower than the vMX as a result, but I've only previously gotten quotes on vRR not vMX.

Chris_Russell · January 13, 2017, 8:29am

The CSR1000v (IOS-XE),IOS-XRv and vMX are production ready. People are
deploying these in production and its increasing in popularity.

Mark Tinka gave a good preso at a recent Nanog:

https://www.nanog.org/sites/default/files/2_Tinka_21st_Century_iBGP_Route_Reflection.pdf

https://www.youtube.com/watch?v=wLEjOj2fyp8&list=PLO8DR5ZGla8hcpeEDSBNPE5OrZf70iXZg&index=21

+1 , not used in production but fantastic in a couple of our lab environments

Chris

James_Bensley1 · January 13, 2017, 11:04am

Sorry I don't know about the pricing, but the newer vMX product is now
split into two VMs, the virtual control plane and virtual forwarding
plane. I think the vRR product is still like the "older" style vMX
which was one combined control and forwarding plane image. At a guess,
perhaps its heavy throughput limited?

We have used the "older" style vMX images in the lab (14.something)
which is the combined all in one VM, it works fine for us for actual
network traffic testing as well as various BGP tests like router
reflectors so I see know reason why it wouldn't work as a vRR. I think
the actual "vRR" product from Juniper is just a more light weight VM,
perhaps someone can clarify the tech behind it?

We don't have any virtual RRs in production yet but we are running
CSR1000v in lab tests right now which is working fine for us so we'll
probably push that out to prod at some point in both scenarios (as an
in path virtual router and out of path virtual route-reflector) but
that is 12+ months away as we still have lots more testing to do.

Cheers,
James.

Phil · January 13, 2017, 1:02pm

The vRR image and the vMX have always been separate. The vRR image is what Juniper sells as a solution for control-plane only applications like vRR. It’s also the image they run as part of their Northstar controller to speak BGP-LS to the network. It’s very lightweight, you can run a bunch of them in very little memory space, for instance if you want to do a vRR per AFI/SAFI, or service. I’ve tested it against the vMX in applications like vRR and the performance is pretty much identical with much less memory/cpu use.

Cisco makes a distinction between IOS-XRv which is their simulation/test version of XR like you would find in VIRL/CML and the XRv-9000 which is optimized for higher throughput. They sell a vRR-specific version of the XRv-9000 that is very reasonably priced. XRv-9000 is a bit more cpu/memory intensive in my experience.

Nokia also has a vRR version of their SR-OS virtual router. It has a lightweight cpu/memory footprint and is very fast. But really all of the virtual vRR solutions are fast and scale very high, little performance difference between them. I would recommend one based on the vendors you are most comfortable with and support for the AFI/SAFIs you are interested in.

Phil

Robert_Blayzor1 · January 13, 2017, 1:09pm

+1 here on the CSR1000v, works very well.

However, I’d have to give another +1 to XRv because RPL is more flexible and easier to manage than route-maps in IOS.

Leo_Bicknell1 · January 13, 2017, 1:23pm

In a message written on Thu, Jan 12, 2017 at 08:32:44PM +0000, Justin Krejci wrote:

I am working on some network designs and am adding some additional routers to a BGP network. I'd like to build a plan of changing all of the existing routers over from full iBGP mesh to something more scalable (ie route reflection).

You might want to better define "scalable". I don't know your
background or network so I can't guess. I can say I've seen
the inner workings of some large ISP networks with a lot of hosts
in iBGP that work fine, and then people with 5 routers try and
tell me they have a scaling problem.

What is your actual problem? Memory usage? Convergence time?
Configuring the sessions? Staff understanding of how it works?

I am wondering if people can point me in the direction to some good resource material on how to select a good BGP route reflector design. Should I just dust off some 7206VXR routers to act as route reflectors?

This is a red flag to me, relative to the questions above.

The 7206VXR, even with an NPE-G2, is a 1.5Ghz Power PC with a paltry
2GB of DRAM. It was not speedy when new, being roughly equivilent
to the PowerPC G4 processors in Apple Laptops at the time. It is
approximately 8 times slower than a current iPhone. Seriously.

If convergence time is anything you care about, a 7206VXR is a very
bad choice. It may also run out of memory if you have a lot of
edges with full tables.

So what's the actual "scaling" problem?

Justin_Krejci · January 13, 2017, 10:12pm

Thanks for all of the replies (on and off list). It is appreciated.

Scaling in this context is simply adding more and more routers and needing/wanting to avoid configuring full mesh iBGP due to the administrative burden of maintaining the growing size of full mesh topology. In one particular network in question, I have 11 routers fully meshed and need to add several more over the coming 6-12 months, possibly adding as many as 10 more routers in that time span. I'd prefer not to continue doing full mesh.

As for 7206VXR with NPE-G1 or G2 cards, we have many sitting in a decommissioned state on shelves as well as a few still alive serving a handful of T-1 lines and various other legacy connections of that sort. These little 7200's sit and run, forever near as I can tell. As many routers in this network do contain full route eBGP connections I will strongly consider your suggestion of avoiding using the 7200's due to potential memory constraints and CPU/convergence time capabilities. I don't think I have done any full table feeds on a 7200 in many years (days of 200k-300k table size days)

This fits in with the kind of feedback I was hoping for, Thanks!

Brandon_Ewing · January 13, 2017, 10:39pm

One important thing to remember when migrating from full mesh to a RR design
is that you are reducing information available to the routers in the ASN.
When you had a full mesh, each router could select the best path from all
available paths, according to its position in the IGP. In a RR environment,
by default, routers only have available to them the best routes from the
RR's position in the IGP, which can lead to suboptimal exits being selected.

Work is being done to allow RRs to compute metrics from the client's
position in the IGP: See
https://tools.ietf.org/html/draft-ietf-idr-bgp-optimal-route-reflection-13
for more information

Bandy_Rush1 · January 14, 2017, 3:46am

Scaling in this context is simply adding more and more routers and
needing/wanting to avoid configuring full mesh iBGP due to the
administrative burden of maintaining the growing size of full mesh
topology. In one particular network in question, I have 11 routers
fully meshed and need to add several more over the coming 6-12 months,
possibly adding as many as 10 more routers in that time span. I'd
prefer not to continue doing full mesh.

if those numbers were x 10 or more, 'scaling' becomes a concern.

the way to add to an ibgp mesh or any other topology, including those
with rrs, is automation.

As for 7206VXR with NPE-G1 or G2 cards, we have many sitting in a
decommissioned state on shelves

i suspect there is a reason.

randy

Phil · January 17, 2017, 4:51pm

Cisco and Juniper both have working ORR implementations, although config on the Juniper one is a bit clunky right now. One interesting thing is they also allow feeding topology data via BGP-LS, so BGP is the only protocol you need to run to/from it.

Phil

Mark_Tinka1 · March 20, 2017, 10:35am

BGP-ORR is currently supported in Junos and IOS XR (ASR9000, I
believe... I haven't confirmed for other IOS XR platforms).

I'm getting Cisco to add support for it in IOS and IOS XE (CSR1000v).
I'm now dealing with the usual "How large is the customer's spend for
this feature" nonsense. BGP-ORR, I feel, is one of those features that
doesn't need a business case - much like ketchup at a fast-food joint.

That the IOS XR PI team have it in there and the IOS/IOS XE PI teams
don't highlights the depth of the fundamental problem over at Cisco-land.

Mark.

bengelly · March 20, 2017, 10:46am

Same old same.

Y.

ghankins_237a87 · March 20, 2017, 11:03am

Mark is spot on, this is an important point. We just added ORR to SR OS
15.0.R1 on the 7x50/VSR.

Greg