TFTP over anycast

Hi,
I’m working on some DR design and we want to not only have this site as a DR but also performing some active/active for some of the services we hosts and I was wondering if someone had some experience with using anycast for TFTP or DHCP services?
What are some of the pains/challenges you experienced and things we should lookout for?

Any input is greatly appreciated.

Hi Javier,

Anycast for TFTP is more or less the same as anycast for TCP-based
protocols: it has corner cases which fail and fail hard, but otherwise
it works. Outside the corner cases, the failure mode for tftp clients
should be the same as the failure mode when the tftp server goes down
in the middle of a transfer and comes back up some time later.

The corner cases are variations on the theme that your routing causes
packets from a particular source to oscillate between the tftp
servers. In the corner cases, the tftp client can't communicate with
the -same- tftp server long enough to complete a transfer.

Experiments with anycast TCP suggest that the corner cases happen for
less than 1% of client sources, but when they do happen tend to be
persistent, affecting all communication between that client and the
anycast IP address for an extended duration, sometimes weeks or
months.

Regards,
Bill Herrin

I do NTP, DHCP, TFTP, DNS, HTTP anycast.

NTP, DNS and HTTP with ECMP, TFTP and DHCP as active/active on a per
Datacenter Basis.

These are small Datacenters with less than 50k Servers each.

In every datacenter an anycast node is active and the router just
chooses the shortest path.

It becomes tricky for DHCP if a location has the same cost to more than
one anycast Node. For this case we have setup a DHCP nodes in two
datacenters using different local-preferences to simulate a failover
active/passive setup.

Cheers
Thomas

Others have addressed some of the issues, but one easy win for DHCP (which is otherwise a PITA to make redundany in any way) is to (a) not block ICMP anywhere, including on the client devices, and (b) have the DHCP ping before assignment. That’s not always on by default, and it’ll eliminate ~90% of the conflicts you would otherwise encounter if the anycast node isn’t extremely stable. If you become aware of a distributed DHCP server that actually works well in this environment, that’s worth a post to the list all by itself.

-Adam

The relay server dhcplb could, maybe, help in that scenario (dhcplb runs on the anycast IP, the “real” DHCP servers on unicast IPs behind dhcplb).

Ask

Although they used the word "anycast", they're just load balancing.
Devices behind a load balancer are not "anycast," since the load
balancer explicitly decides which machine gets which transaction. Even
with clever load balancers like Linux Virtual Server in "routing" mode
where the back-end servers all share the virtual IP address, that's
load balancing, not anycast routing.

An IP is not "anycast" unless it moves via anycast routing. Anycast
routing means it's announced into the _routing protocol_ from multiple
sources on behalf of multiple distinct machines.

In their readme, they comment that their load balancer replaced
attempts to use anycast routing with equal cost multipath. That makes
good sense. Relying on ECMP for anycasted DHCP would be a disaster
during any sort of failure. Add or remove a single route from an ECMP
set and the hashed path selection changes for most of the connections.
All the DHCP renewals would very suddenly be going to the wrong DHCP
server. Where anycast works, it works because ECMP only rarely comes
into play.

Regards,
Bill Herrin

The relay server dhcplb could, maybe, help in that scenario
(dhcplb runs on the anycast IP, the “real” DHCP servers on
unicast IPs behind dhcplb).

Although they used the word “anycast”, they’re just load balancing.

The idea is to run the relays on an anycasted IP (so the load balancer / relay IP is anycasted).

[….] Relying on ECMP for anycasted DHCP would be a disaster
during any sort of failure. Add or remove a single route from an ECMP
set and the hashed path selection changes for most of the connections.

Consistent hashing (which I thought was widely supported now in ECMP implementations) and a bit of automation in how announcements are added can greatly mitigate this.

Ask

The system Ask is describing is the traditional method of using anycast to geographically load-balance long-lived flows. The first time I did that was with FTP servers in Berkeley and Santa Cruz, in 1989.

I did a bigger system, also load balancing FTP servers for Oracle, their public-facing documentation stores, with servers in San Jose and Washington DC, a couple of years later. A couple of years further on and the World Wide Web was a thing, and everybody was doing it.

-Bill

Thanks to you all for your answers, it has helped me a lot already.

My design is very simplistic, I have 2 sets of firewalls that I will have advertising a /32 unicast to the network at each location and it will have a TFTP server behind each firewall.

I have no intention to have this be part of the internet as it will be used to serve internal customers devices that require TFTP
For the setup where you are running Anycast on a datacenter, are you running it inside the datacenter only or across multiple datacenters? other than having to replicate IPs and file services between datacenters have you seen any other issues?

Hi Javier,

That sounds straightforward to me with no major failure modes. I would
make the firewall part of my OSPF network and then add the tftp
servers to OSPF using FRR. Then I'd write a script to monitor the
local tftp server and stop frr if it detects any problems with the
tftp server. The local tftp server will always be closer than the
remote one via OSPF link costs, unless it goes offline. I assume you
also have an encrypted channel between the firewalls to handle traffic
that stays "inside" your security boundary, as tftp generally should.

Where you could get into trouble is if you add a third or additional
sites. If there's ever an equal routing cost from any one site to two
others, there's a non-zero risk of the failover process failing... and
you won't know it until you need it.

Regards,
Bill Herrin

Javier,

I have seen a few potential hangups, most of which affect the setup equally if it is within the same datacenter or across datacenters. The difference there usually comes down to a greater chance of disconnects and “split-brain” scenarios when there are servers in multiple datacenters. In that case sharding (AKA cells, zones, etc.) is your friend to ensure that you can operate one site autonomously while disconnected from the others.

Using DHCP servers in this way often reveals some bugs in the implementation depending on which server you are using. Fortunately I have seen several bugs get squashed in a couple of the open source implementations when members of my team reported them to the maintainers, so you should be confident in using one of the most common implementations (ISC, dnsmasq, a few others).

You also need to make sure that your network routing infrastructure tends toward stability and stickiness, so the same client talks to the same server throughout a flow. Of course a failure in the middle of the flow will eventually lead to a failover, but anything in progress is unlikely to recover given the limited error correction and sanity checking in the mentioned protocols. Best to take this into account and plan for a number of retries on any failure. Also make sure to test that all your servers eventually reach consensus after you test failure scenarios, and come up with a plan to force synchronization if needed.

Also, with IPv6 you want to make sure that if you are assigning multiple addresses to clients that all servers will offer the same set of IPv6 IPs. That can be a real headache to debug. You don’t necessarily need a DHCPv6 server to issue IPs at all depending on if your setup supports autoassignment (you’ll need the proper setup of route advertisers on your routers).

Best of luck, I suspect it will work “like magic.” It does work but it flies in the face of past convention about how IP protocols are supposed to be used and requires control over areas that usually cross boundaries of responsibility (system admins vs. network admins vs. security admins).

-Dan Sneddon

There are other ways to achieve this without actually stopping the routing daemon.

We have DNS servers where the anycast service address is added to a loopback interface (lo1) and only advertised when present.

The monitoring script drops and adds the service address to the loopback, without otherwise touching the routing daemon. We use Bird rather than FRR, though.

Ray

That’s been the normal way of doing it for some 35 years now. iBGP advertise, or don’t advertise, the service address, which is attached to the loopback, depending whether you’re ready to service traffic.

                                -Bill

If we are talking about eBGP, then pulling routes makes sense. If we
are talking about iBGP and controlled environment, you should never
pull anycast routes, because eventually you will have failure mode,
where the check mechanism itself is broken, and you'll pull all
routes.
If instead of pulling the routes, you make them inferior, you are
covered for the failure mode of check itself being broken.