MPLS VPN design - RR in forwarding path?

Hi everyone,

I'm reading Randy's Zhang BGP Design and Implementation and I found following guidelines about designing RR-based MPLS VPN architecture:
- Partition RRs
- Move RRs out of the forwarding path
- Use a high-end processor with maximum memory
- Use peer groups
- Tune RR routers for improved performance.

Since the book is a bit outdated (2004) I'm curious if these rules still apply to modern SP networks.
What would be the reasoning behind keeping RRs out of the forwarding path? Is it only a matter of performance and stability?

Thanks,
Marcin

Correct, these ideas are MOSTLY rooted in old school router limitations.

Ymmv. Look for facts in the replies you get, not unsubstantiated opinions.

There is no technical reason to have a bgp rr out of path on a hardware
based forwarding router that has sufficient control plane capacity to run
bgp.

CB

arguably more so now than ever, but you can always run RRs inline in the
forwarding path if you want to. Taking RRs out of the forwarding plane
means that you can keep your overall routing architecture simpler and more
consistent, and adding/removing different forwarding hardware means that
you don't really need to do much with the RR configuration.

The larger router vendors all have virtualised RR implementations these
days (XRv/CSR1k, vRR, AlcaLu, etc), which means that you can get to run
your RRs on standard x86 hardware platforms using normal hypervisors. This
wasn't the case in 2004. The pricing and licensing for virtual RR images
from the normal vendors hasn't settled down into workable models yet but
that's only a matter of time, particularly given that open source routing
stacks are going to start seriously impinging on this market segment in the
next couple of years.

Nick

Hi everyone,

I'm reading Randy's Zhang BGP Design and Implementation and I found
following guidelines about designing RR-based MPLS VPN architecture:
- Partition RRs
- Move RRs out of the forwarding path

I'd find it odd if the RR were the nexthop for any signficant traffic,
in recent deployments I've done there's no fib to speak of excepting igp
routes installed on the RR itself.

- Use a high-end processor with maximum memory

bgp addpath kicked up the memory requirements of the RR considerably
when we deployed it.

When they say "move RRs out of the forwarding path", they could mean
"don't force all traffic through the RRs". These are two different
things. Naive configurations could end up causing all VPN traffic to
go through the RRs (e.g. setting next-hop-self on all reflected
routes) whereas more correct configurations don't do that--but there
may be some traffic that natrually flows through the same routers that
are the RRs, via an MPLS LSP for example. That latter is fine in many
cases, the former is not. E.g. I would argue that a P-router can be
an RR if desired.

Hey,

are the RRs, via an MPLS LSP for example. That latter is fine in many
cases, the former is not. E.g. I would argue that a P-router can be
an RR if desired.

There is no compelling advantage. No budget is too thin for 3 gray NPE-G1, if
they are, maybe network engineers without borders can help you.

There are some compelling disadvantages, my current and previous employer both
have experienced VPN AFI BGP UPDATE crashing whole box (infact whole cluster
of 3 VPN reflectors, at once).

Trying to achieve 0 outages is silly and impossible, reducing outage impact is
often simple and cheap, sometimes not done, when only failure modes considered
are physical (HW, fibre, electricity...) failures, rather than the more common
modes (pilot and software).

Hi,

Right, one is when besides forwarding packets a router also functioning as a RR, another - when RR sets NH to itself and hence forces all the traffic to pass thru the router in fast path.
Keep in mind - some architectures, such as seamless MPLS would require a RR to be in the fast path.
There are some other cases where it could be a requirement.
I'd advice to look into vRR space - price/performance looks quite good.

Wrt open source implementations - if you are looking into relatively basic feature set (v4/v6 unicast/vpn) reliability is not of main concern and of course- there are hands and brains to support it - could be a viable approach.
Might you be looking into more complex feature set - EVPN, BGP-LS, FS,
enhanced route refresh, etc, highly optimized code wrt update rate/ number of peers supported - most probably you'd end up with a commercial implementation.

Hope this helps

Regards,
Jeff

- Move RRs out of the forwarding path

this remains contentious. there are those who think having the control
plane not congruent to the data plane is a recipe for really fun
debugging and has other issues.

randy

Hello all,

Thank you for insightful answers.

I was thinking mostly about the second scenario Chuck mentioned - where some traffic naturally flows through the routers that are the RRs because of MPLS LSP. Setting next-hop-self on all reflected routes would be misconfiguration IMHO.

I am also aware of products like vMX or CSR1000v/XRv and the example given by Saku makes me more interested in licensing/pricing options.

Regards,
Marcin

W dniu 2014-12-31 o 18:05, Chuck Anderson pisze:

Overall, depends on your design and scale. But, I will comment on a few of your items...

We have RRs in the forwarding path but have a project to move them out in 2015. We feel it gives us more options as well as more flexibility when we move to the next phase of RR design (hierarchical).

Most vendors today have the performance numbers (sometimes they aren't published publically) for routers acting as RRs. Ask your vendor and pick one that suits you. We generally buy the middle or most memory and pick a reasonable processor. And, then we monitor :slight_smile:

As for peer groups, you should have a design that allows you to herd most of the config snips together. Use the features that make your life easier and allow you to simplify your routing policies.

tv

Is there a good reason to use actual router hardware for the route
reflector role? Even a cheap server has more CPU and memory. If it is not
in the forwarding path, this is a computing task - not a move packets at
line speed task.

Are anyone using Bird, Quagga etc. for this?

Regards,

Baldur

there are patches for both code-bases and some preliminary support for
vpnv4 in quagga, but other than that neither currently supports either ldp
or the vpnv4/vpnv6 address families in the main-line code.

Nick

You don't need LDP on RR as long as clients support "not on lsp" flag (different implementation have different names for it)
There are more and more reasons to run RR on a non router HW, there are many reasons to still run commercial code base, mostly feature set and resilience.

Regards,
Jeff

Running various functions on a couple small VM clusters makes a lot of sense.

Running various functions on a couple small VM clusters makes a lot of
sense.

I agree, it makes some sense, especially if you are control plane bound.
But, nearly all my routers run between 1% and 10% cpu.

Ymmv. I have feeling that running a bgp rr on cheap / standard / commidity
vm is pretty exotic from a support perspective.

So running a bgp rr on a vm may make sense in theory, but my network
control planes are not too busy and vm bgp is a unique/ exotic support
model.

Your network is probably different

Given that you assign unique RD per PE, RR out of the forwarding path
provides you with a neat trick for fast convergence (and debugging
purposes) when CE has redundant paths to different PEs. Routes to those CEs
will be seen as different routes on RR.

And test coverage. As Saku alluded to earlier in the thread, rr<->rr-client outages are painful. I’ve certainly seen a number of them caused by inter-op issues between implementations. Running at least one RR which matches the code-base of the client means that at least you’re likely to have fallen within the test-cases of that vendor’s implementation.

r.

+100

Regards,
Jeff

Our network spans Africa, South Asia and Europe.

We have 2x RR's in each PoP running Cisco's CSR1000v on
x86_64 hardware under VMware ESXi.

Pricing is not too bad; we use the Premium license which
enables all features (BFD, e.t.c.) that you don't get with
the Standard license.

Been running this configuration since July 2014 - very
happy.

Mark.

Most vendors today have the performance numbers
(sometimes they aren't published publically) for routers
acting as RRs. Ask your vendor and pick one that suits
you. We generally buy the middle or most memory and
pick a reasonable processor. And, then we monitor :slight_smile:

With the major vendors now offering VM-based RR's, I'd
discourage using routers as RR's just for pure long-term
scale.

As for peer groups, you should have a design that allows
you to herd most of the config snips together. Use the
features that make your life easier and allow you to
simplify your routing policies.

Suffice it to say that the Peer Group functionality in IOS
and IOS XE has largely been replaced by Update Groups. We
use peer and session templates, but really, as with Peer
Groups in 2015, it's just to keep things neat and tidy.

Junos, of course, has its way forever which still works
nicely.

Mark.