Re: TFTP over anycast

Hi,
I’m working on some DR design and we want to not only have this site as a DR but also performing some active/active for some of the services we hosts and I was wondering if someone had some experience with using anycast for TFTP or DHCP services?
What are some of the pains/challenges you experienced and things we should lookout for?

Any input is greatly appreciated.

Kind regards,
Javier Gutierrez

I have extensive experience using IP Anycast for TFTP and DHCP, in the area of cloud computing. My primary job role is the development of system and network provisioning in cloud infrastructure, and I’ve spent much of the last twelve years working in this area. This is one area where protocols like BGP and techniques like Anycast have a different set of assumtions and reputations for reliability when considered within a provider’s network or on the Internet at large. Usually IP Anycast for DHCP and TFTP is done in a controlled environment (within a network operated by a single entity) and not done at global scale over the public Internet.

I have designed or contributed to the design of several IP Anycast DHCP/TFTP implementations for cloud computing infrastructure (OpenStack, OpenShift/Kubernetes), using Quagga, Bird, or FRR for OSPF/BGP and Pound/HAProxy/NGinx/MetalLB for load balancing, along with custom ruby/python for DHCP or dnsmasq and typically standard Linux TFTP servers (either on bare metal or inside VMs or containers).

It becomes complicated when you want to perform DHCP and/or TFTP across sites or WAN links, and downright tricky when you want to do it across the Internet. The DHCP servers may be configured with pre-allocated host IP reservations if the clients are known ahead of time, or all servers may use a shared database (often distributed using MariaDB, InfoBlox, or similar) to ensure that each DHCP server agrees about which IPs are assigned and can sync IP reservations and releases. It is usually necessary to ensure that all TFTP servers are offering identical images via TFTP.

Some platforms that I have used IP Anycast DHCP and TFTP servers with:

OpenStack Nova (KVM/QEMU virtual machines): https://docs.openstack.org/nova/latest/

OpenStack Ironic (bare metal): https://docs.openstack.org/ironic/latest/

Metal3 (an offspring of Ironic that works in Kubernetes clusters: https://metal3.io/

As the initial DHCP request is usually done via broadcast request, the network hardware close to the client is often configured as a DHCP relay with either multiple unicast IPs as relay targets or one or more Anycast IPs (this may depend on what a particular vendor supports on a given make/model of network switch or router). In other cases a DHCP unicast IP is hard-coded into a custom image in firmware or microimage, or cached in the case of a running client making a renewal or release.

Most projects use a micro-image booted over TFTP to prepare a second-stage loader that uses a more reliable protocol such as HTTPS.

When using DHCPv6 there are a lot of unique challenges, especially when multiple IPv6 addresses are assigned or the client is a piece of embedded hardware (such as a bare metal IPMI controller or a NIC running PXE/iPXE firmware).

Usually this is all done in either ”private” IP address space (local-scope or RFC1918). Now as far as running these same protocols over the Internet at large, I can’t speak about any personal experience. It is theoretically possible but I don’t know of any large scale examples or experiments.

The underpinnings for IP Anycast that I have used were initially based on Quagga: https://github.com/Quagga

More recently I’ve been working on projects that use a fork of Quagga called FRR (Free Range Router): https://docs.frrouting.

Generally the TFTP and DHCP servers are not directly using IP Anycast, rather there is a load balancer in front of the servers at each site. Initially I used Pound for this, but then NGinX and then HAProxy became preferable. More recently MetalLB on Kubernetes has been the go-to load-balancer, and MetalLB integrates with BGP in a number of ways.

I helped design and bootstrap a project to use BGP for IP Anycast in order to provide load-balanced DHCP and TFTP in OpenStack and OpenShift using OVN, which is related to OpenFlow. Here is the BGP plugin for OVN: https://docs.openstack.org/ovn-bgp-agent/latest

I would be very curious to see any projects which are attempting to do this on the public Internet. Would you mind sharing a bit about your intended use case?

Warm regards,
-Dan Sneddon