IP failover/migration question.

I've got a bit of a network reconfiguration question that I'm
wondering if anyone on NANOG might be able to provide a bit of advice
on:

I'm working on a project to provide failover of entire cluster-based
(and so multi-host) applications to a geographically distinct backup
site. The general idea is that as one datacentre burns down, a live
service may be moved over to an alternate site without any
interruption to clients. All of the host-state migration is done
using virtual machines and associated magic; I'm trying to get a more
clear understanding as to what is involved in terms of moving the IPs,
and how fast it can potentially be done.

I'm fairly sure that what I would like to do is to arrange what is
effectively dual-homing, but with two geographically distinct homes:
Assuming that I have an in-service primary site A, and an emergency
backup site B, each with a distinct link into a common provider AS, I
would configure B's link as redundant into the stub AS for A -- as if
the link to B were the redundant link in a (traditional single-site)
dual-homing setup. B would additionally host it's own IP range, used
for control traffic between the two sites in normal operation.

When I desire to migrate hosts to the failover site, B would send a
BGP update advertizing that the redundant link should become
preferred, and (hopefully) the IGP in the provider AS would seamlessly
redirect traffic. Assuming that everything works okay with the
virtual machine migration, connections would continue as they were and
clients would be unaware of the reconfiguration.

Does the routing reconfiguration story here sound plausible? Does
anyone have any insight as to how long such a reconfiguration would
reasonably take and/or if it is something that I might be able to
negotiate a SLA for with a provider if I wanted to actually deploy
this sort of redundancy as a service? Is anyone aware of similar
high-speed failover schemes in use on the network today?

Thoughts appreciated, I hope this is reasonably on-topic for the list.

best,
a.

You dont say who the “clients” are - I presume this is a web based application so essentially you are trying to migrate service in flight to another set of servers within the TCP/HTTP session timeout without the client missing a beat ?

If another kind of client, does it also have auto reconnect/retry logic built in for service restoral if the connection timesout ?

Is the session/host state worth preserving for communication between the servers in the cluster or between the clients and the service also ?

I know of people who have been able to do this on LANs using SANs to store shared host states and having a new VM pick up the connections, but on an internet-wide scale you are likely looking only at a probabilistic guarentee assuming that your routing would always converge in time and packets start flowing to the Disaster Recovery (DR) site.

This is much easier if you can stick within a single AS ofcourse.

Others will be able to answer whether these routing changes will attract dampening penalties if you have to pick providers in different ASes.

Assuming all of that doesnt matter, then a somewhat cleaner way to do this would be to advertize a less specific route from the DR location covering the more specific route of the primary location. If the primary route is withdrawn, voila … traffic starts moving to the less specific route automatically without you having to scramble at the time of the outage to inject a new route.

I'm trying to get a more clear understanding as to what is involved in
terms of moving the IPs, and how fast it can potentially be done.

can we presume that separate ip spaces and changing dns, i.e. maybe
ten minutes at worst, is insufficiently fast?

I'm fairly sure that what I would like to do is to arrange what is
effectively dual-homing, but with two geographically distinct homes:

uh, that kinda inverts what we normally mean by 'multi-homing'.
that's usually two upstream providers for a single site.

Assuming that I have an in-service primary site A, and an emergency
backup site B, each with a distinct link into a common provider AS, I
would configure B's link as redundant into the stub AS for A -- as if
the link to B were the redundant link in a (traditional single-site)
dual-homing setup.

not clear what you mean by redundant. as the common transit
provider will not do well with hearing the same ip space from two
sources, this type of hack might best be accomplished by B not
announcing the space until A goes off the air and stops announcing
it. [ clever folk might try to automate this, but it would make me
nervous. ]

alternatively, you might arrange for the common transit provider to
statically route the ip space to A and swap to B on a phone call.
this would be very fast, but would require a very solid (and tested
monthly if you're paranoid, which i would be) pre-arrangement with
the provider.

i am sure others can come up with more clever hacks. beware if
they're too clever.

Assuming that everything works okay with the virtual machine
migration, connections would continue as they were and clients
would be unaware of the reconfiguration.

persistent tcp connections from clients would not fare well unless
you actually did the hacks to migrate the sessions, i.e. tcp serial
numbers and all the rest of the tcp state. hard to do.

I hope this is reasonably on-topic for the list.

well, you left of mention of us legislative follies and telco and
cable greed. but maybe you can get away with a purely technical
question once if you promise not to do it again. :slight_smile:

randy

a somewhat cleaner way to do this would be to advertize a less specific
route from the DR location covering the more specific route of the primary
location. If the primary route is withdrawn, voila .. traffic starts
moving to the less specific route automatically without you having to
scramble at the time of the outage to inject a new route.

aha! much cleaner indeed! and works single or multi provider.
<duh>

randy

Date: Sun, 11 Jun 2006 17:02:14 -1000
From: Randy Bush

persistent tcp connections from clients would not fare well unless
you actually did the hacks to migrate the sessions, i.e. tcp serial
numbers and all the rest of the tcp state. hard to do.

Actually, the TCP goo isn't too terribly difficult [when one has kernel
source]. What's tricky is (1) handling splits, and (2) ensuring that
the app is consistent and deterministic.

One transit provider handling multiple locations shouldn't present a
problem. Of course many things that should be, aren't.

== below respsonses are general, not re Randy's post ==

Note also that redundancy/propagation is at odds with RTT latency. The
proof of this is left as an exercise for the reader.

Finally, an internal network between locations is a good thing. (Hint:
compare internal convergence times with global ones.)

Eddy

> I'm fairly sure that what I would like to do is to arrange what is
> effectively dual-homing, but with two geographically distinct homes:

uh, that kinda inverts what we normally mean by 'multi-homing'.
that's usually two upstream providers for a single site.

This almost sounds like an anycasted version of his site... only unicast
from one location then popping up at another location if the primary dies?

> Assuming that I have an in-service primary site A, and an emergency
> backup site B, each with a distinct link into a common provider AS, I
> would configure B's link as redundant into the stub AS for A -- as if
> the link to B were the redundant link in a (traditional single-site)
> dual-homing setup.

not clear what you mean by redundant. as the common transit
provider will not do well with hearing the same ip space from two
sources, this type of hack might best be accomplished by B not
announcing the space until A goes off the air and stops announcing
it. [ clever folk might try to automate this, but it would make me
nervous. ]

I think there is some cisco magic you could do with 'dial backup'... you
may even be able to rig this up with an ibgp session (even if that goes
out over the external provider) to swing the routes.

NOTE: this could make your site oscillate if there are connectivity issues
between the sites, it could get messy FAST, and it could be hard to
troubleshoot. Basically look before you leap :slight_smile:

This link may b e of assistance:
http://tinyurl.com/l8zpm

i am sure others can come up with more clever hacks. beware if
they're too clever.

yes... this probably is...

> I hope this is reasonably on-topic for the list.

well, you left of mention of us legislative follies and telco and
cable greed. but maybe you can get away with a purely technical
question once if you promise not to do it again. :slight_smile:

to get greed into it.. are you sure you want to be 'stuck' with a single
carrier? :slight_smile: What if the carrier dies wouldn't you want redundant carrier
links as well?

Date: Sun, 11 Jun 2006 19:34:12 -0700 (PDT)
From: ennova2005-nanog@...

[A] somewhat cleaner way to do this would be to advertize a less
specific route from the DR location covering the more specific route
of the primary location. If the primary route is withdrawn, voila ..
traffic starts moving to the less specific route automatically without
you having to scramble at the time of the outage to inject a new
route.

This certainly is easier if it's flexible enough. (If one desires high
splay across several locations, this approach is lacking.) The tough
part then becomes internal application consistency.

Eddy

> I'm trying to get a more clear understanding as to what is involved in
> terms of moving the IPs, and how fast it can potentially be done.

can we presume that separate ip spaces and changing dns, i.e. maybe
ten minutes at worst, is insufficiently fast?

Absolutely. We are trying to explore the (arguably insane) idea of
failing things over sufficiently fast (and state-fully) that open
connections remain completely functional.

> I'm fairly sure that what I would like to do is to arrange what is
> effectively dual-homing, but with two geographically distinct homes:

uh, that kinda inverts what we normally mean by 'multi-homing'.
that's usually two upstream providers for a single site.

Yep, which is what I want -- It's just that the single site is going to move. :wink:

Consider a traditional (single site) dual-homed situation, where I'm
not doing any kind of balancing across the links. In that (my
understanding of) that case, I would use a private stub AS with the
two upstream links going to the common provider AS, and advertize a
change to the link weight on the backup link when I wanted a switch to
happen. (Or if the primary failed this would presumably happen
automatically through it's link disappearing.)

In this new scheme, I want to make _everything_ redundant. The backup
link is to a geographically distinct site, and all of the hosts in the
primary site are actively mirrored to the backup site: OS,
applications, TCP connection state and all. So it's _kind of_ dual
homing -- two upstream links for a single (virtual) site.

...
i am sure others can come up with more clever hacks. beware if
they're too clever.

I completely agree with your comments regarding clever hacks, which is
why I'm trying to draw analogy to dual-homing, a technique that's
known, trusted, and clearly not fraught with corner-cases and devilish
complexity. :wink: Seriously though, I'm trying to convince myself that
there is a reasonable approach here that is within the means of
datacenter operators and their ISPs, and would allow a switch with on
the order of seconds of reconfiguration time.

persistent tcp connections from clients would not fare well unless
you actually did the hacks to migrate the sessions, i.e. tcp serial
numbers and all the rest of the tcp state. hard to do.

Since we move the entire OS, the TCP state goes with it. We've done
this in the past on the local link by migrating the host and sending
an unsolicited ARP reply to notify the switch that the IP has moved to
a new MAC (http://www.cl.cam.ac.uk/~akw27/papers/nsdi-migration.pdf),
I think that order-of-seconds reconfiguration should allow the same
sort of migration to work at a larger scope.

well, you left of mention of us legislative follies and telco and
cable greed. but maybe you can get away with a purely technical
question once if you promise not to do it again. :slight_smile:

Thanks! And thanks everyone for the feedback -- incredibly helpful.
I'll try for follies and greed next time. :wink:

a.

I think there is some cisco magic you could do with 'dial backup'... you
may even be able to rig this up with an ibgp session (even if that goes
out over the external provider) to swing the routes.

NOTE: this could make your site oscillate if there are connectivity issues
between the sites, it could get messy FAST, and it could be hard to
troubleshoot. Basically look before you leap :slight_smile:

This link may b e of assistance:
http://tinyurl.com/l8zpm

This link asks me for a login...

to get greed into it.. are you sure you want to be 'stuck' with a single
carrier? :slight_smile: What if the carrier dies wouldn't you want redundant carrier
links as well?

I'd love a multi-ISP solution. I just assumed that anything involving
more than a single upstream AS across the two links would leave me
having to consider BGP convergence instead of just IGP reconfig. I
didn't presume that that would likely be something that happened in
seconds. If there's a fast approach to be had here, I'd love to hear
it.

thanks,
a.

Date: Sun, 11 Jun 2006 20:55:42 -0700
From: Andrew Warfield

I'd love a multi-ISP solution. I just assumed that anything involving
more than a single upstream AS across the two links would leave me
having to consider BGP convergence instead of just IGP reconfig. I
didn't presume that that would likely be something that happened in
seconds. If there's a fast approach to be had here, I'd love to hear
it.

(1) Internal link between locations.
(2) Same ISPs at all locations.

The closer to the source, the faster the convergence.

Be sure to test. I've had multiple links to one provider _within the
same datacenter_ where their iBGP-fu (or whatever they had) was lacking.
Bouncing one and only eBGP session to them triggered globally-visible
flapping. :frowning:

Eddy

> I think there is some cisco magic you could do with 'dial backup'... you
> may even be able to rig this up with an ibgp session (even if that goes
> out over the external provider) to swing the routes.
>
> NOTE: this could make your site oscillate if there are connectivity issues
> between the sites, it could get messy FAST, and it could be hard to
> troubleshoot. Basically look before you leap :slight_smile:
>
> This link may b e of assistance:
> http://tinyurl.com/l8zpm

This link asks me for a login...

aw crap, sorry... try:

http://tinyurl.com/zh7wk

(12.0 code reference infos)

> to get greed into it.. are you sure you want to be 'stuck' with a single
> carrier? :slight_smile: What if the carrier dies wouldn't you want redundant carrier
> links as well?

I'd love a multi-ISP solution. I just assumed that anything involving
more than a single upstream AS across the two links would leave me
having to consider BGP convergence instead of just IGP reconfig. I

both are bgp convergence actually, unless the routes are put from BGP ->
IGP inside the single provider, which is a little scary.

Consider that loctions A and B exist. A is primary, B secondary. B's
routes don't exist in ISP's network. A explodes, the network above A has
to withdraw the routes, the network above B (it's not the same POP nor
POP router right?) has to get new routes from B then send them out.

You'll gain SOME possibly, but that probably depends on the bgp/ibgp
architecture inside the ISP in question :frowning:

didn't presume that that would likely be something that happened in
seconds. If there's a fast approach to be had here, I'd love to hear
it.

get with the greed man! :slight_smile:

clear understanding as to what is involved in terms of moving the IPs,
and how fast it can potentially be done.

I don't believe there is any way to get the IPs
moved in any kind of reasonable time frame for
an application that needs this level of failover
support.

If I were you I would focus my attention on
maintaining two live connections, one to each
data centre. If you can change the client software,
they they could simply open two sockets, one for
traffic and one for keepalives. If the traffic
destination datacentre fails, your backend magic
starts up the failover datacentre and the traffic
then flows over the keepalive socket.

And if you can't change the clients, you can do
much the same by using two tunnels of some sort,
MPLS LSPs, multicast dual-feed, GRE tunnels.
The Chicago Mercantile Exchange has published
a network guide that covers similar use cases.
In the case of market data, they generally run
both links with duplicate data and the client
chooses whichever packets arrive first. Since
market data applications can win or lose millions
of dollars per hour, they are the most time-sensitive
applications on the planet.
http://www.cme.com/files/NetworkingGuide.pdf

When I desire to migrate hosts to the failover site, B would send a
BGP update advertizing that the redundant link should become
preferred,

There is your biggest timing problem which is
also effectively out of your control. By maintaining
two live connections over two separate paths to
two separate data centers, you have more control
over when to switch and how quickly to switch.

--Michael Dillon

There may be actually... if you don't have to be TOO far apart:

soemthing like (that no one at mci/vzb seems to want to market :frowning: as a
product)

2 external connections (isp)
2 internal connections (private network)
2 cities (washington, DC and NYC for this arguement)
2 Metro-Private-Ethernet connections
2 Nokia Firewall devices (IP740 or IP530 ish)
2 catalyst switches
2 copies of equipment in 'datacenter' (one in each location)

Make the nokia's do BGP with the outside world, do state-sync across the
MPLE link, make the MPLE link look like a front-side VLAN, backside VLAN,
and state-sync VLAN (you could do this with a single MPLE connection of
course) announce all routes out NYC, if that link goes dark push routes
out DC link.

State sync on the firewalls Checkpoint/Nokia says will work if the link
has less than 10ms latency (or so... they aren't much with the hard
numbers on this since they noramally site in the same rack). you could
even (probably) make things work in NYC for NYC users and DC for DC
users... though backside state-sync in the apps might get hairy.

-chris

Hi, guys

Very late reply, but this is a 'hot topic' in my space..

I'm trying to get a more clear understanding as to what is involved in
terms of moving the IPs, and how fast it can potentially be done.

can we presume that separate ip spaces and changing dns, i.e. maybe
ten minutes at worst, is insufficiently fast?

Ten minutes at worst, only if everyone is behaving. Some of the UK's largest (in terms of consumer customer numbers) ISPs disobey short dns refresh times, and will cache expired or old records for 24(+?) hours.

Popular web browsers running on popular desktop operating systems also display extra-long dns cache time 'bugs'.

24 hours + outage whilst stale dns disappears will never do in internet retail. BGP, two datacentres, both equivalent endpoints for customer traffic, same IP space, and an e-commerce application which will happily run 'active/active' is the holy grail, I think. The problem isn't setting this up in IP, it's getting your commerce application to fit this model (a problem I have today).

Best wishes,
Andy

Popular web browsers running on popular desktop operating systems
also display extra-long dns cache time 'bugs'.

A well known fact, which leads right into your next comment...

24 hours + outage whilst stale dns disappears will never do in
internet retail.

And yet, with 90% of the net implementing the "will never do" scenario,
we manage to get a lot of internet retail done anyhow. I'm obviously going
to need a *lot* more caffeine to sort through that conundrum....

We do because we don't wait for DNS to time out in broken browsers, we have actual multi-homing with real networks.

Could you imagine slashdot, amazon or google going down for 24 hours?
I think there would be panic in the streets.

Uptime might not matter for small hosts that do mom and pop websites
or so-called "beta" blog-toys, but every time Level3 takes a dump,
it's my wallet that feels the pain. It's actually a rather frustrating
situation for people who aren't big enough to justify a /19 and an
AS#, but require geographically dispersed locations answering on the
same IP(s).

Could you imagine slashdot, amazon or google going down for 24 hours?
I think there would be panic in the streets.

piffle

there would certainly be panic inside of those organizations trying
to fix things. but most people have real lives.

randy

Inside those organizations. Inside their transit providers. Inside the call centers of the big eyeball networks. Etc. It would be a Big Deal on the 'Net.

Of course, the Internet is not "real life", but neither is just about any other business. That doesn't mean they aren't important. Although I do admit "panic in the streets" is hyperbole - but I recognized it as hyperbole. (Not that YOU have ever used hyperbole in your posts, Randy. :slight_smile:

All you need is the ASN, you don't need your own IP space. I've happily announced down to /24s (provider provided and PI) without any issues, where needs required it, and so do many others.

...david