BGP in a containers

Oliver_O_Boyle · June 15, 2018, 2:40am

There's no reason why it shouldn't work well. It's just a minor paradigm
shift that requires some solid testing and knowhow on the ops team.

Christopher_Morrow · June 15, 2018, 5:00am

and... XR or Junos are ... doing this under the covers for you anyway, so..
get used to the new paradigem!

Pete_Lumbis · June 15, 2018, 5:02am

FRR, the modern fork of quagga, has a pre built docker container.
https://hub.docker.com/r/cumulusnetworks/frrouting/

Ray_Burkholder · June 15, 2018, 8:18am

So I have to ask, why is it advantageous to put this in a container rather than just run it directly
on the container's host?

Most any host now-a-days has quite a bit of horse power to run services. All those services could be run natively all in one namespace on the same host, or ...

I tend to gravitate towards running services individually in LXC containers. This creates a bit more overhead than running chroot style environments, but less than running full fledged kvm style virtualization for each service.

I typically automate the provisioning and the spool up of the container and its service. This makes it easy to up-keep/rebuild/update/upgrade/load-balance services individually and enmasse across hosts.

By running BGP within each container, as someone else mentioned, BGP can be used to advertise the loopback address of the service. I go one step further: for certain services I will anycast some addresses into bgp. This provides an easy way to load balance and provide resiliency of like service instances across hosts,

Therefore, by running BGP within the container, and on the host, routes can be distributed across a network with all the policies available within the bgp protocol. I use Free Range Routing, which is a fork of Quagga, to do this. I use the eBGP variant for the hosts and containers, which allows for the elimination of OSPF or similar internal gateway protocol.

Stepping away a bit, this means that BGP is used in tiered scenario. There is the regular eBGP with the public ASN for handling DFZ style public traffic. For internal traffic, private eBGP ASNs are used for routing traffic between and within hosts and containers.

With recent improvements to Free Range Routing and the Linux Kernel, various combinations of MPLS, VxLAN, EVPN, and VRF configurations can be used to further segment and compartmentalize traffic within a host, and between containers. It is now very easy to run vlan-less between hosts through various easy to configure encapsulation mechanisms. To be explicit, this relies on a resilient layer 3 network between hosts, and eliminates the bothersome layer 2 redundancy headaches.

That was a very long winded way to say: keep a very basic host configuration running a minimal set of functional services, and re-factor the functionality and split it across multiple containers to provide easy access to and maintenance of individual services like dns, smtp, database, dashboards, public routing, private routing, firewalling, monitoring, management, ...

There is a higher up-front configuration cost, but over the longer term, if configured via automation tools like Salt or similar, maintenance and security is improved.

It does require a different level of sophistication with operational staff.

Hugo_Slabbert1 · June 15, 2018, 3:20pm

Some bits similar to Raymond's comments, but in our case this was specifically for a Kubernetes deployment. Our k8s deployment is mostly "self-hosted", i.e. the k8s control plane runs within k8s, with the workers being disposable. Dropping the routing into a container that runs in the host's/worker's network namespace means it is just another container (daemonset) that Kubernetes will schedule to the worker as part of initial bootstrapping.

So, we don't run BGP within the application containers themselves but rather on the container hosts. Advertising service IPs is handled by IPVS pods that anycast the service IPs and do DSR + tunnel mode to the k8s pods backing a given L4 service, with an HTTP reverse proxy layer (Kubernetes ingress controllers) in the middle for HTTP/s services.

Andrew_Denton · June 15, 2018, 3:23pm

Have a look at Project Calico, https://www.projectcalico.org/. They
have the route-everything container networking pretty much figured out.

- Andrew

Tom_Limoncelli · June 15, 2018, 3:44pm

Using BGP (Quagga) in containers is a great way to build a simulation of
your actual network. You can then test configuration changes in the
simulation before you make them in production.

You can even build this up into an automated test pipeline where new
configurations are tested in simulation before put into production.

There was a talk about an experimental system like this at the February
2017 meetup:
https://developers.google.com/events/sre/nyc
Title: "DevOps to NetworkOps"
Speaker: Xavier Nicollet, Stack Overflow

Tom

James_Hess · June 16, 2018, 5:51am

So I have to ask, why is it advantageous to put this in a container rather
than just run it directly > on the container's host?

There is no real reason not to run it in a container, and all the
advantages of running ALL applications in standardized containers
(whether the choice be the likes of vSphere,XEN,KVM,Virtuozzo, LXC, or Docker).

Assuming the host runs containers: running one app. outside the
container (BGP) would put the other applications at risk, since there
could be a security vulnerability in the BGP implementation allowing
runaway resource usage or remote code exploitation, or in theory,
the install process for that app could "spoil" the host or introduce
incompatibilities or divergence from expected host configuration.

One of the major purposes of containers is to mitigate such problems.
For example the BGP application could be exploited but the container
boundary prevents access to sensitive data of other apps. sharing the
hardware; the application installer running in a container cannot
introduce conflicts or impact operating settings of the host platform.

Also, the common model of virtualizing the compute resource calls for
treating hosts as a shared compute farm --- no one host is special:
any container can run equally on other hosts in the same pod, and
you hardly ever even check which host a particular container has been
scheduled to run on.

Users of the resource are presented an interface for running their
application: containers. No other option is offered... there is no such
thing as "run my program (or run this container) directly on host X"
option. ----
no host runs directly any programs or services which have configurations
different from any other host, and also every host config is about
identical other than hostname & ip address; Simply put: being
able to run a program outside a container would violate the
service model for datacenter compute services that is
most commonly used these days.

Running the BGP application in a container on a shared storage system managed by
a host cluster would also make it easier to start the service up on a
different host when
the first host fails or requires maintenance.

On the other hand, running directly on a host, suggests that
individual hosts need
to be backed up again, and some sort of manual restore of local
files from the lost host
will be required to copy the non-containerized application to a new host.

Hugo_Slabbert1 · June 18, 2018, 3:45pm

Even if the BGP speaker is running right on the host, the shared storage or backups thing doesn't click for me. What about your BGP speaker will need persistent storage? At least in our environment, everything unique about the BGP speaker is config injected at startup or can be derived at startup. This might be based on differences in how we're using them (BGP daemon per container host in our case, rather than "I need X number of BGP speakers; schedule them somewhere"), I guess.

Jeff_Walter2 · June 18, 2018, 6:13pm

Years back I ran ExaBGP inside a Docker container (when it wasn't
"production ready") to anycast a contained service both within a datacenter
and across them. To make routing work correctly I had to also run another
BGP daemon on the Docker host machine; I can't remember if I used bird for
this, but it seems like what I'd use since I didn't need programmatic
control of prefixes.

Would I do it that way today? Not a chance. How would I do it? That would
really depend on two things: what I'm trying to accomplish with BGP and
what the service is. If you just want portability of a service (not
redundancy/balancing via anycast) is BGP really the best option? I'd make a
strong case for OSPF due to it needing far less config. The same need for a
routing instance on the Docker host would apply, but you wouldn't need to
manage configuration for neighbors as containers come up and go down (since
the IP will likely change). Sure, you could just add neighbor config for
every IP Docker might use, however-- ouch.

Jeff Walter

Doug_Clements · June 18, 2018, 6:29pm

These days I think the idea is to use unnumbered or dynamic neighbors so
most of the configuration complexity goes away:

https://docs.cumulusnetworks.com/display/DOCS/Border+Gateway+Protocol+-+BGP#BorderGatewayProtocol-BGP-ConfiguringBGPUnnumberedInterfaces

In this case, your container would peer directly with the switch.

--Doug