VRF/MPLS on Linux

Brian_Christopher_R2 · August 23, 2011, 1:45pm

While I have found some information on a project called linux-mpls I am having a hard time finding any solid VRF framework for Linux. I have a monitoring system that needs check devices that sit in overlapping private ip space, and I was wondering if there is anyway I could use some kind or VRF type solution that would allow me to label the "site" the traffic is intended for. The upstream router supports VRF/MPLS, but I need to know how I can get the server to label the traffic. I would appreciate any input.

Jared_Mauch · August 23, 2011, 1:50pm

In linux, you can manage the different routing tables.

You can do this with the iptables + iproute2 series of commands. The tables 254/255 are the main and local tables.

You shouldn't have too much trouble finding information via google on how to manage your needs.

- Jared

Denys_Fedoryshchenko · August 23, 2011, 2:04pm

I guess VRF more close to Linux containers.

Brian_Christopher_R2 · August 23, 2011, 2:43pm

Jared,
Thank you for your reply. The one issue I have is how can I label traffic to match a given table (i.e. ping VRF or snmp VRF). I don't see any way this can be done with normal BSD sockets, finding a way to get my application to 'color' the traffic has been a little evasive. The developers I am working with are using Mule for their data collection. I would really prefer to add an MPLS tag to mark the traffic, but I will investigate what I can do using the Linux routing features and 802.1q tags.

Nathan_Eisenberg · August 23, 2011, 3:12pm

Jared,
Thank you for your reply. The one issue I have is how can I label
traffic to match a given table (i.e. ping VRF or snmp VRF). I don't
see any way this can be done with normal BSD sockets, finding a way to
get my application to 'color' the traffic has been a little evasive.
The developers I am working with are using Mule for their data
collection. I would really prefer to add an MPLS tag to mark the
traffic, but I will investigate what I can do using the Linux routing
features and 802.1q tags.

I don't know about Mule, but Zabbix has the concept of premise-based proxy servers which work around this issue, and it works quite well.

Perhaps this issue can be solved at the application layer with some similar proxying methodology, rather than making this a very complicated routing issue?

Sergey_V_Lobanov · August 23, 2011, 3:31pm

Hello,

I implemented it via dot1q vlans+iproute2+iptables. Description can be found at http://forum.nag.ru/forum/index.php?showtopic=57082&st=0&p=501082&#entry501082 . Please use the Google translator to translate from Russian to English.

23.08.2011, 17:45, nanog@rhemasound.org:

Dan_White · August 23, 2011, 3:35pm

Although I can't vouch for it, quagga seems to have the command set to
function as an MPLS PE router (possibly in conjunction with linux-mpls) to
pass vpnv4 routes and tags. That doesn't address how you're going to mux
socket connections to the overloaded IP addresses in different VRFs, which
would seem to require MPLS knowledge within your monitoring application to
support (unless you're running multiple instances).

You might consider a more straight forward approach, such as running a
separate instance of your monitoring application within a VM, bridged to a
separate VLAN towards your MPLS PE, or just running two hosts.

Mike_Jones1 · August 23, 2011, 4:18pm

I would probably go for the suggestion of (ab)using QoS tags for the
routing table selection, but just to throw this alternate idea out
there:

1.0.0.0/8 1:1 NATed to 10.0.0.0/8 marked to use routing table 1, which
routes to network 1
2.0.0.0/8 1:1 NATed to 10.0.0.0/8 marked to use routing table 2, which
routes to network 2
etc

That way your application layer won't need any additional logic and
can just deal with them as separate non-overlapping IP spaces, this
won't work if you have too many overlapping networks (but then linux
only supports 252 additional routing tables anyway afaik) or if you
need external connectivity that can't be proxied.

In a similar manner if your tools support IPv6 you could have a /96
that is NAT64'ed on to each different network, i'm not sure about this
for a production setup although it would have the added benefit that
you can expose these routes to your management network to provide
easier access from your other machines if you wanted to.

- Mike

Glen_Turner · August 24, 2011, 3:16am

The Linux kernel as shipped by Linus supports multiple routing tables
and allows you to forward traffic from interfaces to differing tables --
that is, can implement VRF. The abstraction is better than on most
routers, with policy routing allowing the selection of the routing table
(to implement a VRF the policy is a simple "if received on interface X
then use realm N"). Searching "realms" or running "man ip" will get you
started.

The Linus kernel does not have support for MPLS. You could patch the
kernel, and then use Quagga as the router to populate the MPLS
forwarding table. But personally, if you have a MPLS-speaking router
upstream I'd simply bridge each MPLS tunnel into a VLAN to the Linux
computer. Then you can use a stock vendor kernel, with its lack of
maintenance hassles.

Brian_Christopher_R2 · August 24, 2011, 10:06am

The only issue with this is that the Linux box is not acting as a router, but as the egress devices. I'm trying to figure out how to properly get my application to 'color' the traffic. standard BSD sockets appear to have no concept of 'Labels'. Still seeing what I can do to match the traffic. I am probably going to see if I can work out a hack with the development team to use DSCP values to tag the traffic and then act accordingly on the ingress router. I appreciate all the ideas presented so far.

Hannes_Frederic_Sowa · August 24, 2011, 12:40pm

You could also have a look at linux namespaces if you want to manage
routing tables per process. Especially the new setns syscall could be
handy: <https://lwn.net/Articles/407495/>

Greetings,

Hannes

Simon_Perreault · August 24, 2011, 1:18pm

Just FYI: on OpenBSD you can set the VRF (aka "routing table" or
"routing domain") per socket with code like this:

  int s, table;
  s = socket(...);
  table = 123;
  setsockopt(s, IPPROTO_IP, SO_RTABLE, &table, sizeof(table));

Simon

Jared_Mauch · August 24, 2011, 2:28pm

You can classify this in the OUTPUT or POSTROUTING table with ipchains. Take a look at the man page for it. There's lots of information online about how to do this. I recall a sysadmin who I worked with 15 years ago that thought of routers as the black boxes that got their packets around, but a little bit of understanding of these lower levels of the kernel/networks will go a long way.

Some help:

INPUT (for packets destined to local sockets)
FORWARD (for packets being routed through the box)
OUTPUT (for locally-generated packets; for altering locally-generated packets before routing)
PREROUTING (for altering packets as soon as they come in)
POSTROUTING (for altering packets as they are about to go out)

http://linux-ip.net/html/adv-multi-internet.html should also prove useful in your research. You likely are going to end up using the localhost fwmark/mark. Some tools show this number in hex, others decimal, so keep this in mind during your debug process.

- Jared

Eduardo_Schoedler1 · August 24, 2011, 3:56pm

More VRF info:

http://lartc.org/lartc.html#LARTC.RPDB.SIMPLE

Jussi_Peltola · August 24, 2011, 5:37pm

Or exec your commands wrapped in route -T$TABLE exec $*

Caveat: ipv6 vrf's did not work the last time I tried, and I think they
still don't.

OpenBSD should also do MPLS VPNs with the VRF's, but it's also pretty
much experimental. It worked fine in a quick lab test at my last try, I
should dig my lab notes and document it...

Some things, like /etc/resolv.conf, still need some attention with VRFs.

Simon_Perreault · August 24, 2011, 5:40pm

The fix for that was committed to HEAD recently. I think it's going to
be part of 5.0 or 5.1. Effectively it means s/IPPROTO_IP/SOL_SOCKET/ in
the example code above.

Simon

Hannes_Frederic_Sowa · August 24, 2011, 5:58pm

FYI, on linux you can use 'ip netns exec'. The subcommand is rather
new and you will only find it in the git repository.

Greetings,

Hannes

Brian_Christopher_R2 · August 26, 2011, 11:02am

I want to thank everyone for their input and I have gleened many useful ideas from this discussion.
Hopefully some standard like BSD sockets will be written for routing realms/vrfs, then let the fun begin.
It appears that the Java based framework our developers used can not be extended to allow direct packet/socket manupulation, so we will be looking are using different vm's to get around our issue.
Again I really enjoyed this discussion with everyone and am excited about the progress that is being made in bringing this concept dirrectly to the host.