FW: DNS TTL adherence

> This behavior is unfortunately not unique.

Alas what others peoples servers do, shouldn't be an issue for you.

Your

problem is they can be coerced into a DoS attack, not that the data is

stale.

actually, dos-attack-aside, the interesting thing is that lots of

people

(original poster perhaps included) believe that TTL's are adhered to
except in some marginal cases. I think Rodney's point is that they are

not

adhered to anywhere near as much as we would all like to believe :frowning:

So, if you, or the original poster, is going to move

${important_resource}

around ip-wise keep in mind that your ${important_thing} may have to
answer to more than 1 ip address for a period much longer than your

tuned

TTL :frowning:

Thanks all for the responses. I do understand we may need to support the
old IP addresses for sometime. I was hoping someone had performed a
study out there to determine what a ratio maybe for us supporting an old
IP address (I know our traffic profile will be unique for us thus it
would only give us a general idea).

For example if we change ip addresses will we need to plan on 20%
traffic at old site on day1, 10% day2, 5%, day3, and so on...? There are
also issues related to proxy servers and browser caching that are
independent of DNS we will need to quantify to understand full risk. The
more data we have will drive some of our decisions.

Thanks again,

Steve

(re-sending because I wasn't on nanog-post)

For example if we change ip addresses will we need to plan on
20% traffic at old site on day1, 10% day2, 5%, day3, and so
on...? There are also issues related to proxy servers and
browser caching that are independent of DNS we will need to
quantify to understand full risk. The more data we have will
drive some of our decisions.

You might consider the following paper from IMC 2003: "On the
Responsiveness of DNS-based Network Control" by Jeffrey Pang, Aditya
Akella, Anees Shaikh, Balachander Krishnamurthy, Srinivasan Seshan,
http://www.imconf.net/imc-2004/papers/p21-pang.pdf

It sheds some light on how widely DNS TTLs are adhered to. The CDF
graphs on the 4th page suggest that you should be fairly safe after a
day, though I don't see if the paper specifically states what the
largest recorded violation was.

Sharad.

The results are greatly at odds with my experience.

As they imply the problem may be specifically misconfigured ISPs DNS server,
which might explain why we see less violations, if our sites aren't popular
with those ISPs users.

However I wouldn't trust any report where the control of the authoritative DNS
itself wasn't explicitly monitored and reported. They may think they have
updated the authoritative answers (and TTL), but in my experience when you
find violators you often find that the authoritative DNS servers didn't all
update as, or when, expected, or that earlier records were returned with a
longer TTL from those servers.

Certainly that was the experience of moving many sites last week. Where you
can in real time check the logs and find which domains we messed up on by the
traffic still arriving.

Looking at the 4 long term violators for one site....

  Hits Source IP
      8 198.78.130.68 <--- ??
      1 212.95.252.16 <--- lager.netcraft.com
     15 66.147.154.3 <--- IBM Almaden Research Center
      5 70.42.51.10 <--- Fast Search & Transfer

During this period (starting 3 days after moving a 10 minute TTL) we saw 27234
hits (okay not exactly a busy site) for that site on the correct server. So
roughly 1 in a 1000 hits during days 3 to 6 went to the old web server, and
this domain had the most lost hits, most of the moved domains don't show in
the old server's log at all.

Given I think we can exclude at least 21 out of 29 safely as being
"non-human" (sorry IBM Research if you were deeply interested in proof
reading), and I'm guessing have made a deliberate effort to cache stale data
for their own reasons.

So I can put an upper estimate on our sites of 1 in 1000 hits of interest
going to the wrong site during days 3 to 6.

The most popular site moved, had only two DNS violators days 3 to 6, the most
notable being the same "Fast Search & Transfer" IP above.

It may be that popular sites have a far worse problem by dint of exercising
more caching code, but this site is far from being our most popular. And
these sites were moved by reducing the TTL to a low value (10 minutes) and
keeping it there for a long period of time, before we actually performed the
move.

:: >So, if you, or the original poster, is going to move
:: ${important_resource}
:: >around ip-wise keep in mind that your ${important_thing} may have to
:: >answer to more than 1 ip address for a period much longer than your
:: tuned
:: >TTL :frowning:
::
:: Thanks all for the responses. I do understand we may need to support the
:: old IP addresses for sometime. I was hoping someone had performed a
:: study out there to determine what a ratio maybe for us supporting an old
:: IP address (I know our traffic profile will be unique for us thus it
:: would only give us a general idea).
::
:: For example if we change ip addresses will we need to plan on 20%
:: traffic at old site on day1, 10% day2, 5%, day3, and so on...? There are
:: also issues related to proxy servers and browser caching that are
:: independent of DNS we will need to quantify to understand full risk. The
:: more data we have will drive some of our decisions.

In my not-so-scientific "studies" with changind IPs for a fairly large
volume site, I found that 90% of the people will use the new ip within an
hour of TTL expiration, 99.999% of the people within 3 days, and that
remaining .001% may take years....

As someone said earlier, some parts of the 'net are just broken beyond
your control...

-igor