Question regarding anycasting in CDN setup

Hello everyone!

I have a small question and was wondering if someone could help me with
that.

Question is - why companies like Google, Amazon are having partial
anycasting in CDN setups? E.g if we pick a random hostname from url of
Picasa picture - lh3.googleusercontent.com - this one is further a cname
string and at the end you will find different A records when checked from
different locations.

E.g when checked from my local system (in India):

;; QUESTION SECTION:
;lh3.googleusercontent.com. IN A

;; ANSWER SECTION:
lh3.googleusercontent.com. 86276 IN CNAME
googlehosted.l.googleusercontent.com.
googlehosted.l.googleusercontent.com. 176 IN A 209.85.175.132

Next, lookup from a server in Europe:

;; QUESTION SECTION:
;lh3.googleusercontent.com. IN A

;; ANSWER SECTION:
lh3.googleusercontent.com. 86400 IN CNAME
googlehosted.l.googleusercontent.com.
googlehosted.l.googleusercontent.com. 300 IN A 209.85.148.132

thus different IPs in both cases.

I understand that Google is doing anycasting on core DNS servers, and thus
we always hit nearest DNS server and all DNS servers are sort of
independent and carry different A records for CDN strings which point to
local cache server IP addresses. And here's confirmation:

anurag@laptop:~$ dig googleusercontent.com. ns +short
ns2.google.com.
ns3.google.com.
ns4.google.com.
ns1.google.com.

Picking ns1.google.com. and asking IP for
googlehosted.l.googleusercontent.com. from different locations:

anurag@laptop:~$ dig @ns1.google.com googlehosted.l.googleusercontent.com.
a +short
209.85.175.132

anurag@server7:~$ dig @ns1.google.com googlehosted.l.googleusercontent.com.
a +short
209.85.148.132

As expected - same server (which appears same but is different) giving
different values - thus I am actually hitting different servers in both
cases.

Now my question here is - why this setup and not simply using having a A
record for googlehosted.l.googleusercontent.com. which comes from any
anycasted IP address space? Why not anycasting at CDN itself rather then
only at DNS layer?

Can someone explain?

Thanks!

The simple answer for this is, Google cannot be expected to have a
local cache of every image supplied to them globally on every server.
So they use unicast servers behind a DNS based geo load balancer
configuration. As for DNS, every anycasted node is expected to be
able to resolve any DNS request that is made.

It's all a matter of disk and acceptable delay in providing the data
from the "closest" disk.

charles

The real answer to this is highly variable based on criteria that are unknown
by many people outside of the operators at these networks.

what is fairly well known:

1) Anycast can be used to provide low latency queries for stateless (UDP) and
   state full protocols (TCP).
2) Query responses will vary based on node hit and/or source IP address the
   query comes from. Source address is used to attempt traffic localization.

   This can be defeated by using another resolver on purpose, or inadvertently
   (eg: corporate VPN may cause you to use a CDN node that is non-local by using
    corp DNS).
3) CDNs vary the response based upon uptime/load and other unknown policy criteria.
   They don't want to send you to a server that is down, nor one that is overloaded.

The secret is in the sauce here and is complex enough that it's not easy to perfect.

Also, be careful equating Anycast w/ CDN. They are not the same thing but sometimes
are related. (e.g.: cousins)

  - Jared

<snip>

Now my question here is - why this setup and not simply using having a A
record for googlehosted.l.googleusercontent.com. which comes from any
anycasted IP address space? Why not anycasting at CDN itself rather then
only at DNS layer?

You are confusing anycasting with offering different results.

I can have an anycast DNS setup where all my servers give the same
response (example: most DNS providers), I can also have a single DNS
server give 192.0.2.80 out to queries sourced from a US IP Address,
198.51.100.80 for queries sourced from a German IP Address and
203.0.113.80 to queries sourced from a Chinese address (djbdns has a
module for this for example).

I would guess that google probably have a highly customised algorithm
which uses a combination of source IP and the node that your query
arrived at as part of the process for deciding what answer to give
you, along with dozens of other internal factors.

Although I do sometimes wonder why they use CNAME chains in cases
where the same servers are authoritative for the target name anyway.

If you were wondering why they direct you to the unicast addresses for
the local datacentre instead of just giving an anycast address which
your nearest datacentre would answer, well their algorithm might
decide that it wants to serve you content from the second closest
datacentre because the closest one is near capacity, anycast can't do
that.

- Mike

Nice explanation!

Thanks Mike.

Appreciate it.

Mike

I can also have a single DNS

server give 192.0.2.80 out to queries sourced from a US IP Address,
198.51.100.80 for queries sourced from a German IP Address and
203.0.113.80 to queries sourced from a Chinese address (djbdns has a
module for this for example).

I have never did such setup, but I assume it works as you say. I wonder how
it finds a US based system from IP quickly (since it's DNS server)?

Thanks.

Here is *one* method if you obtain a feed of geo-ip data from someone like Maxmind:

http://phix.me/geodns/

Several DNS providers have different methods and different geo-ip data vendors.

-b

[snip]

I have never did such setup, but I assume it works as you say. I wonder how
it finds a US based system from IP quickly (since it's DNS server)?

Drop "ip geolocation" or "internet geolocation" into Your Favorite
Search Engine. Short answer is some folks just refer to databases
published/generated by others, some folks use DNS guesses, and some
folks measure packet arrival. And most often, there is a combination
of methods used.

Great explanation .

Thanks everyone

Anurag Bhatia
http://anuragbhatia.com