RE: DNS Based Load Balancers

What would be a better solution then?

but it's a perfect example of why GSLB based on DNS ain't perfect.

> What would be a better solution then?

utopia would be for DNS to be enhanced in some manner such that the 'end
user ip-address' became visible in the DNS request.
utopia would have NAT devices which actually updated that in-place so an
authoritive nameserver always authoritively _knew_ the public ip-address of
where the request was coming from.

alas, we don't live in utopia and have to settle for alternate solutions.

one such approach is rely on protocol-specific mechanisms. e.g. if its
HTTP, then something at HTTP.
oh wait - that won't deal with HTTP proxies either - but at least there is
some standardization on HTTP headers that proxies insert giving a hint of
the original client ip-address.

there are other approaches also. a few years back when i spent a fair bit
of time in this area, my experience is that a hybrid system based on
"specific protocol" and "generic solution" (dns) worked best. this simply
isn't an area where "one solution fits all cases".

there are public companies whose business model depends on this being 'hard'
to do right. them being capable of doing something 'better' than not all
all is the reason they are still in business.
i did a fair bit of research in this area as part of work i used to do a few
years back. much of that research belongs to my employer - i thought it was
documented publicly in the form of a patent i am a co-inventor of - but
alas, i can't seem to find it on .. perhaps it hasn't been issued
yet .. i haven't tracked these things for years.

in either case, i guess its an example of where even commercial entities
whose business model depends on 'getting it right' most of the time do
indeed 'get it wrong' also.



Stepping back for a moment…

Many (most) popular services end up in multiple data centers first because they want to get diversity (of data centers, of ISPs, maybe of pricing). All mission critical sites will be designed such a subset of these data centers can take their entire load if need be.

Once spread out this way - you may need to run some or all of them in an active/active configuration so you need to balance load between them in some fashion between them.

If you are going to split the load - a natural desire is to split it such that it actually increases performance for users.

You figure network proximity (of the end user to the serving destination) ought to be a criteria -but the load on your cluster may be more important for personalization intensive sites.

You start with round robin DNS but it leaves you unsatisfied along the way. You play around with souped up DNS servers that are fed with monitoring tools that measure reachability as well as some measure of load. You also discover that the most popular browser will gladly ignore your TTL settings and insist on sending your traffic to the data center that is down. You are frustrated when you find out that users of ISP A are being served out of your Data Center at ISP B, even though you have a data center connected to ISP A. You think Anycast might be the answer but not everyone is set up to do Anycast. You find some clever people have been aggregating data that will offer to geolocate your callers IP addresses and maybe there is a way to use that information to find the nearest server. You realize the accuracy of this list is dubious, the exchange points for several countries may actually be on the coasts of the United States, and how would you integrate this into your DNS or HTTP redirector, while still doing 2 shift day job.

You turn to alternatives, and find the shiny boxes and/or services called the GLBS. They perform 2 main services.

First, they hand out answers, which may vary in time and space, to your clients as to where to find the service they are looking for.

Second, they decide what this “right” answer is.

You post to NANOG and you get admonished about their efficacy on both counts. This is initially wrapped in appeals to love of God and country and general harm that might befall mankind but no one says what or why.

On reflection, objections to the first part of this are usually along the “strict constructionist” point of view. No real harm comes from returning changing answers but when the Man who wrote the book jumps in with both feet you take pause. He chides people for using stupid tricks. You wonder if they are stupid in the same way as the “For Dummies” series of books is not really for dummies.

Objections to the determination of what the “right” answer is are more vociferous. Some immediately take the view that since the question was about DNS based load balancers, the inference was that the GLBS must be using DNS logistics to decide what the right answer is, even though DNS may simply be used to "right communicate the right answer ( the first part) , but not calculated ( the second part).

The GLBS may indeed be using some measure of server load, or even BGP derived network maps, or some other knowledge of topology or proximity but that gets drowned in the “the proximity of the DNS resolver to the GLBS is not a proxy for the actual end user”. The latter is actually strictly true, and it is difficult to argue given the specific examples of where it fails, but no one is able to say how many times in normal use this technique actually returns a bad answer.

You even hear from a man with one leg in US and one in Europe using a split tunnel VPN who wonders why when he orders Pizza using his tunnel to the HQ back in Europe, he doesn’t get greasy satisfaction back in the US. You wonder what happens when he calls 911 on his VOIP phone, without having manually configured his PSAP in that configuration, but you have other problems to worry about at the moment. You also hear about the “AOL Proxy” effect masking all users behind it. Well actually you don’t hear that, but someone should have chimed in about that.

You hear some mumbling about the use of AS path lengths or a geo-location database of end user IPs not being a true measure. Yet you wonder if the Internet is actually not getting more stable everyday and that the nominal topology and the AS Paths for the more heavily trafficked routes may actually not change that rapidly in normal course.

You also hear from others who have been using variations of GLBS for several years, and have even created large businesses by serving their customers this way. Their web sites are full of gleaming testimonials from these customers. Some one says no one got fired for using the GLBS… You wonder if those customers just bought insurance.

You scratch your head some more. You want to order that pizza on line but you decide against it.

You realize for all its resiliency and elegance at the packet shoveling level, the services architecture on the Internet still leaves a few things to be desired.

You finally realize that most of the objections, well reasoned as they have been, are on specific and narrow grounds and these may or may not actually matter in your situation.

You understand that it is futile to look for the “best” answer every single time - you just want it a large portion of the time, while still meeting your site diversity goal (the failure of which will actually get you fired).

You finally look at your budget, you examine your proclivity to hack and tweak, you consider the other demands on your time, and you finally chose a solution that is somewhere on the non linear continuum of time,money and benefit: round robin DNS, special purpose DNS servers that also calculate the “right answer”, http redirectors that are topology aware, Anycast if appropriate, a GLBS appliance, a GLBS service and other assorted glueware tossed in the middle.

Everyone is slightly dissatisfied, but hey, isnt that the hall mark of a successful negotiation.

That would kill all cacheability of DNS.

Split tunnel VPNs do somewhat break the DNS GSLB model, but I don't think that's
as bad as anti-DNS GSLB people claim it is. If you were on a full-tunnel VPN, you
would expect to be sent to nocal, right?

This could also be fixed in split tunnel VPNs with a local DNS proxy that only used
the DNS cache on the other side of the VPN for the "internal" domains, and your ISP's
DNS cache for everything else. That proxy could even be built into your VPN client.

With wide open recursive nameservers getting such bad press lately, I would expect
to see client <-> caching nameserver proximity getting a lot closer.

John Payne wrote:

What would be a better solution then?

multiple A RR's for your web service, each leading to an independent web
server (which might be leased capacity rather than your own hardware),
each having excellent (high bandwidth, low latency, etc) connectivity to
a significant part of the internet. the law of averages is a good friend
to those who can adequately provision, so the likely outcome is that you
won't need anything fancy. but if you need something fancy, use session
level redirects to tell a web browser or sip client that there's a better
and closer place for them to get their service. pundits please note that
the fancy thing i'm recommending sit perfectly on top of the non-fancy
thing i'm recommending.