I've got a bit of a network reconfiguration question that I'm
wondering if anyone on NANOG might be able to provide a bit of advice
I'm working on a project to provide failover of entire cluster-based
(and so multi-host) applications to a geographically distinct backup
site. The general idea is that as one datacentre burns down, a live
service may be moved over to an alternate site without any
interruption to clients. All of the host-state migration is done
using virtual machines and associated magic; I'm trying to get a more
clear understanding as to what is involved in terms of moving the IPs,
and how fast it can potentially be done.
I'm fairly sure that what I would like to do is to arrange what is
effectively dual-homing, but with two geographically distinct homes:
Assuming that I have an in-service primary site A, and an emergency
backup site B, each with a distinct link into a common provider AS, I
would configure B's link as redundant into the stub AS for A -- as if
the link to B were the redundant link in a (traditional single-site)
dual-homing setup. B would additionally host it's own IP range, used
for control traffic between the two sites in normal operation.
When I desire to migrate hosts to the failover site, B would send a
BGP update advertizing that the redundant link should become
preferred, and (hopefully) the IGP in the provider AS would seamlessly
redirect traffic. Assuming that everything works okay with the
virtual machine migration, connections would continue as they were and
clients would be unaware of the reconfiguration.
Does the routing reconfiguration story here sound plausible? Does
anyone have any insight as to how long such a reconfiguration would
reasonably take and/or if it is something that I might be able to
negotiate a SLA for with a provider if I wanted to actually deploy
this sort of redundancy as a service? Is anyone aware of similar
high-speed failover schemes in use on the network today?
Thoughts appreciated, I hope this is reasonably on-topic for the list.