Anycast provider for SMTP?

I have a mail system where there are two MX hosts, one in the US and one in
Europe. Both have a DNS MX record metric of 10 so a bastardized
round-robin takes place. This does not work so well when one site goes
down. My solution will be to place a load balancer in a hosting site
(virtual, of course) and have it provide HA. But what about HA for the
LB? At first glance anycasting would seem to be a great idea but there is
a problem of broken sessions when routes change.

Have any of you seen something like this work in the wild?

I guess there is no real chance without conntrack ... I'll try to use something like LVS+mysql conntrack (no idea if this even exists ...) ....

Jürgen Jaritsch
Head of Network & Infrastructure

ANEXIA Internetdienstleistungs GmbH

Telefon: +43-5-0556-300
Telefax: +43-5-0556-500

E-Mail: jj@anexia.at
Web: http://www.anexia.at

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt
Geschäftsführer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601

Well we, Genuity, use to use Cisco Distributed Director to do this. Basically it was a DNS server that ran on a Cisco Router, and could use a lot of different metrics to give an answer, which included routing based metrics.

Johno

I guess there is no real chance without conntrack ... I'll try to use something like LVS+mysql conntrack (no idea if this even exists ...) ....

not clear how helpful that is?

From: Joe Hamelin [joe@nethead.com]
Received: Montag, 15 Juni 2015, 19:51
To: NANOG list [nanog@nanog.org]
Subject: Anycast provider for SMTP?

I have a mail system where there are two MX hosts, one in the US and one in
Europe. Both have a DNS MX record metric of 10 so a bastardized
round-robin takes place. This does not work so well when one site goes
down. My solution will be to place a load balancer in a hosting site

'when one site goes down' ... then the other works fine, right? smtp
is not latency sensitive in the sense that a 30second timeout for a
server will mean delivery to the secondary... right?

  My solution will be to place a load balancer in a hosting site
(virtual, of course) and have it provide HA. But what about HA for the
LB? At first glance anycasting would seem to be a great idea but there is
a problem of broken sessions when routes change.

Have any of you seen something like this work in the wild?

Anycast + TCP = much pain, for reasons which should be obvious. It's
on the near side of impossible, but the far side of impractical. You'd
spend a lot of money with some high-price software developers getting
it to work.

I have a mail system where there are two MX hosts, one in the US and one in
Europe. Both have a DNS MX record metric of 10 so a bastardized
round-robin takes place. This does not work so well when one site goes
down.

Not sure why you'd have problems with this since it's a primary
operating mode that SMTP was explicitly designed for. Can you
elaborate on the kinds of trouble you've experienced?

Regards,
Bill Herrin

It seems like you may be over-thinking this.

You could, in fact, use anycast, in one of two ways:

You could anycast the DNS, with servers in the US and Europe, and different MX metrics between the two, so anyone who’s nearby the European DNS server will see the European MX host as the first-choice, and anyone nearer the US DNS server will see the US MX host as first-choice.

Or you could skip the MX records, and just put both US and European SMTP servers on the same IP address, which would save a lot of steps and simplify the system, but leave you with the _very_ occasional corner-case of someone equal-path-length load-balancing traffic to you such that half of one TCP session goes to Europe, and half the the US. That’s a bogeyman that scares a lot of people into not using anycast for TCP services, particularly long-lived ones, but it’s a theoretical problem rather than an actually-observed-in-the-wild problem. But since it scares people, it’s probably safer just doing the DNS anycast, rather than SMTP anycast, to avoid startling the easily-upset out there. :slight_smile:

Either of these is vastly simpler and more reliable than trying to throw a load balancer into the mix. As you note, load balancers aren’t particularly HA. Always replace load balancers with crossconnects. Much more HA.

                                -Bill

The two MX sites are connected via third party MPLS. The problem is when
one MX site loses Internet connectivity the sending MTA may take up to 4
hours to resend and hopefully the DNS coin toss gives it the address of the
site that is still connected. (Read as: French ISPs don't seem as robust
as I'm use to in the US.) Since our mail traffic is international
something like anycast would be nice. Now the other problem is we don't
have an ASN or do external BGP ourselves.

And not that it matters in a network sense, but this is a Domino mail
system. I'm just trying to bring it up to year 2000 standards.

Anycast + TCP = much pain, for reasons which should be obvious.

This was presented at some conference or other a couple of years ago:

https://www.nanog.org/meetings/nanog37/presentations/matt.levine.pdf

Nick

Give a look at hosted GSLB service, FortiDirector, which I have set up for a customer (for SMTP, Exchange, ActiveSync world wide services.

Or you could skip the MX records, and just put both US and European
SMTP servers on the same IP address, which would save a lot of
steps and simplify the system, but leave you with the _very_
occasional corner-case of someone equal-path-length load-balancing
traffic to you such that half of one TCP session goes to Europe, and
half the the US. That’s a bogeyman that scares a lot of people into not
using anycast for TCP services, particularly long-lived ones, but it’s a
theoretical problem rather than an actually-observed-in-the-wild problem.
But since it scares people, it’s probably safer just doing the DNS
anycast, rather than SMTP anycast, to avoid startling the
easily-upset out there. :slight_smile:

If I had a dollar for every system that's collapsed from a known but
previously "theoretical" problem... It's only theoretical until a VIP
can't connect. Deploy a system without covering the corner cases and
your comeuppance is assured.

Okay, granted you can probably cover your corner case here with a
priority 20 MX that leads to a unicast address on one of the two
servers. SMTP can let the rare fellow with the bisected packet flow
gracefully fall back.

Nevertheless, I think you've offered some really bad advice here Bill.
Hijackers killing the passengers was a bogeyman too. If you just kept
calm and cooperated, you lived through it. Until you didn't, and
allowed yourself to be an instrument in killing thousands on the
ground as a bonus. Sometimes the math offers really bad advice.

but 'well behaved smtp clients' should already be falling back right?

Hi Joe,

I have a mail system where there are two MX hosts, one in the US and one in
Europe. Both have a DNS MX record metric of 10 so a bastardized
round-robin takes place. This does not work so well when one site goes
down. My solution will be to place a load balancer in a hosting site
(virtual, of course) and have it provide HA. But what about HA for the
LB? At first glance anycasting would seem to be a great idea but there is
a problem of broken sessions when routes change.

Have any of you seen something like this work in the wild?

If you can give responses to QTYPE=MX queries that match the location of the client, you can approximate this without deploying your SMTP servers using anycast. This feels like a simpler solution to operate; anycast sometimes pits BGP-fearing, syseng people against neteng people when things break at 3am, and if that rings true for you then a solution that avoids it might be of interest.

So, suppose clients in region A could query NETHEAD.COM/IN/MX and get a response that looks like

   NETHEAD.COM. IN MX 10 REGION-A-MX.NETHEAD.COM.
                IN MX 20 REGION-B-MX.NETHEAD.COM.
                IN MX 20 REGION-C-MX.NETHEAD.COM.

whereas clients in region B might see a response that looks more sensible to them:

   NETHEAD.COM. IN MX 10 REGION-B-MX.NETHEAD.COM.
                IN MX 20 REGION-A-MX.NETHEAD.COM.
                IN MX 20 REGION-C-MX.NETHEAD.COM.

etc, etc.

That way you still get a reasonable fallback in the event that one MX target is unreachable for a particular client, but you steer the bulk of your traffic in a way that makes sense (and which your syseng people don't have to understand the details of).

You can achieve the above DNS trickery using various load balancers that other people in this thread have already mentioned. You can also install your own geomaps in your own nameservers and handle it yourself, or you can buy managed DNS service from various people that can do this kind of thing.

Disclaimer: Dyn, for whom I work, sells such a service.

Joe

Anycast + TCP = much pain, for reasons which should be obvious.

This was presented at some conference or other a couple of years ago:

https://www.nanog.org/meetings/nanog37/presentations/matt.levine.pdf

From that otherwise encouraging preso:

"What about IPv6? We have a plan! We plan to be dead before customers
demand IPv6".

I am pretty sure the authors are still alive(?).

I have been using anycast at a small scale on mesh networks, for dns,
primarily. Works.

Hi Joe,

Have you been able to document which originating MTA software
misbehaves this way? Correct SMTP behavior is to attempt TCP
connections to all IP addresses at each MX level in turn, and repeat
for each MX level. Only upon failure of all of them. defer the message
for later delivery.

Interrupted connections (as opposed to timeouts) may go straight to
deferred, figuring that bulk traffic like email should pause if
congestion exhibits itself in the form of a stalled TCP connection. So
it would make sense for a handful of messages to be delayed. And of
course all bets are off if Internet connectivity is "flapping" instead
of hard down.

Regards,
Bill Herrin

I see no major problems to use anycast for that.

The problem will be in rare case when particular routing chain from
client to one of your servers will be changed until TCP stream is active.

SMTP have short connections. Even if it happens, it will look as just
broken connection for client, and it will shortly re-try it.

Am I lost something?

but 'well behaved smtp clients' should already be falling back right?

If you have multiple SMTP servers at the same priority, it's a pretty
broken client that doesn't try them all until one works.

That said, there is a depressing number of pretty broken SMTP clients.

R's,
John

Many of us have been using anycast at Internet scale for DNS for a couple of decades. I would go further than "works" and perhaps say "necessary".

There were some wise words written in RFC 4786 about use of anycast with other protocols (well, I think they are wise, but then I wrote some of them):

    When a service is anycast between two or more nodes, the routing
    system makes the node selection decision on behalf of a client.
    Since it is usually a requirement that a single client-server
    interaction is carried out between a client and the same server node
    for the duration of the transaction, it follows that the routing
    system's node selection decision ought to be stable for substantially
    longer than the expected transaction time, if the service is to be
    provided reliably.

    Some services have very short transaction times, and may even be
    carried out using a single packet request and a single packet reply
    (e.g., DNS transactions over UDP transport). Other services involve
    far longer-lived transactions (e.g., bulk file downloads and audio-
    visual media streaming).

    Services may be anycast within very predictable routing systems,
    which can remain stable for long periods of time (e.g., anycast
    within a well-managed and topologically-simple IGP, where node
    selection changes only occur as a response to node failures). Other
    deployments have far less predictable characteristics (see
    Section 4.4.7).

    The stability of the routing system, together with the transaction
    time of the service, should be carefully compared when deciding
    whether a service is suitable for distribution using anycast. In
    some cases, for new protocols, it may be practical to split large
    transactions into an initialisation phase that is handled by anycast
    servers, and a sustained phase that is provided by non-anycast
    servers, perhaps chosen during the initialisation phase.

    This document deliberately avoids prescribing rules as to which
    protocols or services are suitable for distribution by anycast; to
    attempt to do so would be presumptuous.

    Operators should be aware that, especially for long running flows,
    there are potential failure modes using anycast that are more complex
    than a simple 'destination unreachable' failure using unicast.

Joe

I could be mistaken, but you might get all of this done with AWS's Route53.
I would read this:
http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html#routing-policy-geo

The other step would be to setup HA in each SMTP node (US and France) such
as LB or Failover. Just an idea.

The mailserver is seldom the problem (it's an AS/400) but the ISP pipe
experiences prolonged outages.

I have been using anycast at a small scale on mesh networks, for dns,
primarily. Works.

Many of us have been using anycast at Internet scale for DNS for a couple of
decades. I would go further than "works" and perhaps say "necessary".

Oh, I agree.

My point was that anycast is also potentially of use in smaller
(corporate/mesh) networks, not just in DNS, but smtp as being
discussed here. Web and other forms of proxy, also. Other cases, like
gittorrent?

I'm pretty sure it's a bad idea for ntp, and for non-fully mirrored
file distribution services.

There were some wise words written in RFC 4786 about use of anycast with
other protocols (well, I think they are wise, but then I wrote some of
them):

a good read.