dns authority changes and lame servers

I find it exceptionally annoying that there is no process whereby the root servers and/or registrars can inform us of new/modified/removed delegations. The end result is that we serve a lot of stale zones long after they leave us. In the past I've hacked out some perl to audit our BIND configs and find the stuff that's moved, but it's ugly. And really, it's only partially dependable. For example, does the lack of root server records mean that:

1) the customer abandoned the zone and no longer wishes us to host it
- or -
2) the customer forgot to pay the zone today, and tomorrow will bitch like hell if my script removes it overnight

There are sub-problems of this, mostly related around customers who move and change their company names every six months. So now I have a customer whose zone has expired from the roots (no more email to them) and whose phone number has changed (no way to call and find out what real intentions re: expired zone are). It's not worth our time to physically drive to their site to answer a question that has little to no real financial implications for us (thanks to the free hosting of up to three domains with order of T1 service).

So questions:

1) Does anyone else find this flaw in the DNS system as annoying as I do? If authority is to be regularly moved around between ISPs (who may be hosting thousands of customer domains), some automated process is needed to allow the ISP to make intelligent choices about when to remove a customer zone (authority transfers to another provider are likely the thing I'd key on, while non-payment removals would probably have a 30 day grace period since aforementioned physical moves are most likely cause of non-payment expiration).

2) Does anyone have a better way of cleaning out the dreck than some home-grown scripts? I've used sleep() judiciously to try not beating on any external servers more than necessary, but the output is less than 100% predictable and often hand audits are required before I can really generate automatic removals.

We used to get bitch notices from someone about zones we were supposed to be authoritative for and weren't. This was even more annoying, since often the whole point was that the customer was "parking" it on our servers but had used their 3 freebies and had no real immediate use for it, so neglected to tell us of it. Fine. But give us some notification, from somebody, so we can stick an empty placeholder in there and be ready when it is deployed.

For extra fun, this week a customer simply added their new providers DNS servers to their zone, without removing ours, or asking us to remove our config. So things were kinda whacky for them until someone called us and asked WTF was going on.

1) Does anyone else find this flaw in the DNS system
as annoying as I do? If authority is to be regularly
moved around between ISPs (who may be hosting thousands

As an operator of both free and paid DNS services, I wish there was a
quick and easy way to pull a list of all of the zones that were
delegated to a specific IP address. I say IP because people can now
register their own DNS name servers at the registrar and use our IP
addresses, so using the "official" hostname isn't even fool-proof.
Being able to pull such an "official" list for forward DNS zones would
certainly make life easier.

We also have home-grown scripts that figure out whether a domain is
delegated to us or not and flag the ones that aren't. In the case of
the free service we flag them for two weeks and if they still aren't
delegated to us after that period we disable them on the DNS servers but
leave the domain in their account. In the case of the paid service we
make a note of the status in the database but do not make any changes to
the account (they're paying us, after all, to have it there). We don't
do recursive lookups so it's not an issue (even though it's technically
an RFC violation, if I remember correctly).

I suppose the problem with having an official list to query would be
getting all of the various registries to participate and keep it
regularly updated. I personally qualify this as a slight inconvenience,
but I'm not sure I would call it a flaw in the DNS system.

-Justin Scott

This report used to be quite useful in that regard:

http://www.cymru.com/DNS/lame.html

Perhaps Rob needs a coffee injection to get that going again?

(BTW: Need/want some more of our famous "Colo Blend" Mr. Thomas?)

--chuck

Justin Scott wrote:

I suppose the problem with having an official list to query would be
getting all of the various registries to participate and keep it
regularly updated. I personally qualify this as a slight inconvenience,
but I'm not sure I would call it a flaw in the DNS system.

If we just call DNS a distributed database, then it is easy to see that when the keys (glue at root) get updated, the relations to those keys *should* all reflect that change. The flaw is that the system creates cruft almost continuously. I'd love to see a graph of the cruft on a global scale, because I'm positive that over time it is growing (though in ways that are not always operationally impactful since most of it will be dead and abandoned zones still sitting in our named.conf).

And I'll admit, I'm not sure how to properly fix it either. My first thought was a BIND directive to "expire-stale-zones <interval>;" so that every <interval> the server might check to be sure it is still auth, and if it has found authority changed, would stop giving out AAs for it. But I see all kinds of operational issues arising from that too (such as, how do we gracefully setup new customer's zone before it has transitioned here).

Really, in my ideal Internet, once my server was notified that it was no longer authoritative, it would have an option to do a reverse xfer to the new auth servers (who would then be free to accept/reject the old information as necessary - can't count the number of times I've tried to get customers to provide zone file records in advance and failed because they don't know how/where to get them from). But that's an ideal Internet that will never exist, I know.

Hi, Chuck!

This report used to be quite useful in that regard:

The Team Cymru Weekly Lame Name Server Report - ON HIATUS

Perhaps Rob needs a coffee injection to get that going again?

Oh, my, I'd totally forgotten about that report. I do need to get that going again. I'll dig around now to see what we can produce in short order.

(BTW: Need/want some more of our famous "Colo Blend" Mr. Thomas?)

That was some of the best joe I've had, and I'd welcome another batch! Just don't tell the rest o' Team Cymru about it - it's mine, all mine! Muahaha! :slight_smile:

Thanks!
Rob.

Justin Scott wrote:

As an operator of both free and paid DNS services, I wish there was a
quick and easy way to pull a list of all of the zones that were
delegated to a specific IP address. I say IP because people can now
register their own DNS name servers at the registrar and use our IP
addresses, so using the "official" hostname isn't even fool-proof.
Being able to pull such an "official" list for forward DNS zones would
certainly make life easier.

How annoying or frustrating is it for people?

Is it so annoying that you'd be willing to pay for a list of every public-facing NS record pointed at a given IP?

I should also mention the related work starting over here:
http://www.nanog.org/mtg-0710/presentations/Vixie-lightning.pdf

-David

I find it exceptionally annoying that there is no process whereby the
root servers and/or registrars can inform us of new/modified/removed
delegations.

Why can't you just query the other side of the zone cut once a
day/week/month/youpick and compare the NS set from the delegating
side to the NS set you have as the presumed authority side? That
combined with a bit of information only you would have about which of
your mismatches are changes you're currently managing, and which are
surprises, would surely give you the data you need?

At the same time, I'll point out that registries, at least, are under
some pressure not to release too much information about this sort of
thing. Nevertheless, various third parties are obtaining regular
zone snapshots, and then making some sort of business out of their
conclusions from the zone data. I'd (personally, not speaking for my
employer) love to be able to offer such services, but any time a
registry operator suggests anything of the sort, people get angry.

To answer specific questions:

1) Does anyone else find this flaw in the DNS system as annoying as I
do?

I don't think this is a "flaw in the DNS system" as much as it is a
consequence of the funny economics currently on display among domain
name registrars, DNS operators, and ISPs.

2) Does anyone have a better way of cleaning out the dreck than some
home-grown scripts?

If you pay someone else to operate your DNS, then you get to offload
the dreck-cleaning to them! But other than that, no.

Best regards,
A

How annoying or frustrating is it for people?

Is it so annoying that you'd be willing to pay for
a list of every public-facing NS record pointed at
a given IP?

Nope. As I mentioned earlier, I qualify this as a minor inconvenience
on the servers that I manage. It may be for someone who manages more
zones than I do though.

-Justin Scott

Justin Scott wrote:

We also have home-grown scripts that figure out whether a domain is
delegated to us or not and flag the ones that aren't. In the case of
the free service we flag them for two weeks and if they still aren't
delegated to us after that period we disable them on the DNS servers but
leave the domain in their account. In the case of the paid service we
make a note of the status in the database but do not make any changes to
the account (they're paying us, after all, to have it there). We don't
do recursive lookups so it's not an issue (even though it's technically
an RFC violation, if I remember correctly).

We use home-grown scripts to follow the NS trail and verify that we are listed in some form or fashion. If we aren't, we handle the problem based on the criteria. If the domain is listed elsewhere, we immediately remove and notify. If the domain isn't listed in TLD, we notify yet hold the domain for I think 30 days before removing it; unless the status changes.

Jack

Andrew Sullivan wrote:

I don't think this is a "flaw in the DNS system" as much as it is a
consequence of the funny economics currently on display among domain
name registrars, DNS operators, and ISPs.

I suppose it is a social problem at the very bottom here. If my users were educated enough to notify me when they moved authority I wouldn't have this problem. Maybe it's not fair to ask the Registrars/Roots to provide updates when it's really incumbent on their customers to do so.

But then I start to balk -- any process that involves duplicate updates of one piece of information in two disparate systems is inefficient at best, and inherently prone to these kind of errors even with good intentions.

There is an economic factor at play in our smaller scale operation. It's barely worth the time of billing to track all these "free" dns hostings. If we charged for it, the customers might be more attentive and notify us in order to be released from the charges (but likely we can't charge enough to really even make it worth their time either).

At one level this is all a minor nuisance. Then I hear of the customer who, doing business with another former customer in the same building, spent a year printing out and walking over their emails because they were too lazy to call us and find out why they weren't getting through. I can pretty fairly claim that's "not our fault" that no one bothered to ask us to remove the cruft, but the customers on the receiving end of the DNS black hole just know that our DNS server was "broken" and "didn't get an update" and next week they'll be calling me asking me to "update my cache" when they can't get to foobar.com.

I do something similar with a nagios plugin (perl script). It
reports lameness and serial mismatch. I've put it online here:

http://www.life-gone-hazy.com/src/nagios/check_zone_auth

Duane W.

davidu@everydns.net (David Ulevitch) writes:

I should also mention the related work starting over here:
http://www.nanog.org/mtg-0710/presentations/Vixie-lightning.pdf

indeed. while i don't have even a tenth of the analysis expertise of someone
like robt, wessels, florian, or april, i am most assurely going to gather the
raw data and make it available to those folks and similar folks. (noting that
i've got 5Mbit/sec now and am hoping for 1000X that much a year from now, and
noting that robt, wessels, florian, april, paul laudanski, and jeff chan have
already got dedicated or shared hosts connected to the rebroadcast switch,
and that more are welcome.)

we may yet publish a top-500-domains web page, since that's a fairly easy
thing to build using this raw data. current zonecuts, and nameserver name
or address deltas, may also come from us, though i think it'll come sooner
from wessels, april, or florian.

if you're not submitting data yet, i hope you'll decide to do so, and drop me
some e-mail (vixie@isc.org) to discuss details.

The correct way to change a delegation is to:

  * add the new servers as stealth servers for the
    current zone.
  * if the old master is to be removed, make it a slave
    of the new master.
  * add the new NS records to the zone.
  * wait for all the slaves to have the new zone.
  * inform the parent zone of the new NS records.
  * wait until all the old NS RRsets have expired from
    caches (implies waiting for the parent's changes to propagate).
  * remove the old NS records from the zone.
  * wait for all the slaves to update.
  * inform the parent zone of the new NS records.
  * wait until all the intermediate NS RRsets have expired from
    caches (implies waiting for the parent's changes to propagate).
  * any slaves that are not being remove and that are still
    using the old master (or slave that is going away) need
    to be configured to use the new master by this point.
  * stop serving the zone on the old servers.

  Note: all through this process the namesevers listed in the
  NS records are answering for the zone in a consistant manner.

  Note: even if the parents informed you that the delegation
  was removed you still have to wait for the records to expire
  from caches *before* you can stop serving the zone.

  One can collapse the above slightly by informing the parent
  of the final NS RRset, rather than the intermediate NS
  RRset, but that won't work with registrars that check the
  childs NS RRset.

  One way to get around this would be to charge a cleanup fee
  that only gets charged when the client fails to notify you
  in advance that they are going change delegations.

  Mark

mike@rockynet.com (Mike Lewinski) writes:

Justin Scott wrote:

> I suppose the problem with having an official list to query would be
> getting all of the various registries to participate and keep it
> regularly updated. I personally qualify this as a slight inconvenience,
> but I'm not sure I would call it a flaw in the DNS system.

If we just call DNS a distributed database, then it is easy to see that
when the keys (glue at root) get updated, the relations to those keys
*should* all reflect that change. ...

And I'll admit, I'm not sure how to properly fix it either. My first
thought was a BIND directive to "expire-stale-zones <interval>;" so that
every <interval> the server might check to be sure it is still auth, and
if it has found authority changed, would stop giving out AAs for it. But
I see all kinds of operational issues arising from that too (such as,
how do we gracefully setup new customer's zone before it has
transitioned here).

as duane said, it's possible to accomplish this with creative nagios plugins.
however, i agree that it's something BIND should do, to be comprehensive. if
someone is excited enough about this to consider sponsoring the work, please
contact me (vixie@isc.org) to discuss details.

Really, in my ideal Internet, once my server was notified that it was no
longer authoritative, it would have an option to do a reverse xfer to
the new auth servers (who would then be free to accept/reject the old
information as necessary - can't count the number of times I've tried to
get customers to provide zone file records in advance and failed because
they don't know how/where to get them from). But that's an ideal
Internet that will never exist, I know.

it's because we didn't know exactly how to scope this problem that RFC 2136
does not permit the insertion or deletion of authority zones. noting that
the ideal internet you want is within our grasp if we can only define it and
sponsor it, i recommend taking up this thread on namedroppers@ops.ietf.org or
dns-operations@lists.oarci.net.

unsubscribe nanog

unsubscribe

Sounds like a really bad idea to me.

The original problems sound like management issues mostly. Why are they
letting customers who don't understand DNS update their NS records, and if
they do, why is it a problem for them (and not just the customer who fiddled
and broke stuff).

Similarly we'll provide authoritative DNS for a zone as instructed (and paid
for), even if it isn't delegated, if that is what the customer wants.

For as long as one doesn't mix authoritative and recursive servers, it matters
not a jot what a server believes it is authoritative for, only what is
delegated. Hence one can't "graph the mistakes" as one would have to be
psychic to find them.

Perhaps they need to provide DNS status reports to clients, so the clients
know if things are misconfigured? Monitoring/measuring is the first step in
managing most things. But I think far more important to find and fix what is
broken, than to try and let the machines prune it down when something is
wrong, although I guess breaking things that are misconfigured is a good way
to get them fixed :wink:

Sounds like the real problem is that your authotative and caching DNS
servers are mixed up.

If they are split then it doesn't really matter if you still host a lame
record because (since it's lame) nobody will ask you about it.

Simon Lyall wrote:

Sounds like the real problem is that your authotative and caching DNS
servers are mixed up.

Understood. I've worked to turn off recursion to the world and made it through that without too much pain (except for the people who transport statically configured laptops on and off our network). The next step isn't trivial since it's a matter of updating quite a lot of data. It's important and we're working on it for the benefit of the customers, but this will be an operational issue for us for a while.

I'm sure I'll get a response telling me to just change the glue at root for the NS and be done, but that won't help any other externally registered names pointing to my DNS with their own glue at root. Then there are the ARPAs, all with "interesting" pedigrees and various processes (true, they are least likely to be the problem, but now I have to split the zone management onto more than one server so it's not as simple as just changing my glue at root).

And there's the case in the last few years of $REAL_BIG_ILEC who provides DSL service and has the same configuration we do. It took some legalish threats all the way to their CEO to get a stale zone removed, after 9 months of attempting to work through the "regular" channels (even their former customer couldn't get the request processed!). Their policy is apparently to not remove zones, ever.

So no matter how quickly I transition my network, this is still going to affect your customers some day, because there are a lot of other people in the same boat I am - lots of statically configured DNS resolvers aren't going to change themselves and if the same caching servers are also hosting thousands of zones that were added incrementally over the last 12+ years....

We gave up long ago trying to get our technical contacts listed on each customer domain whois / registrar role account, because we couldn't get better than 50% response rate.

If they are split then it doesn't really matter if you still host a lame
record because (since it's lame) nobody will ask you about it.

It's still cruft and ideally should still be cleaned up automatically based on the external authority changing.

Simon Lyall wrote:

Sounds like the real problem is that your authotative and caching DNS
servers are mixed up.

Understood. I've worked to turn off recursion to the world and made it through that without too much pain (except for the people who transport statically configured laptops on and off our network). The next step isn't trivial since it's a matter of updating quite a lot of data. It's important and we're working on it for the benefit of the customers, but this will be an operational issue for us for a while.

I've yet to try it, but if you're running BIND you should be able to split it up in to views:
- View A takes queries from your end users (based on source IP) and acts as a recursive cache.
- View B takes queries from everyone else (catchall) and answers authoritatively.

You'll probably run in to a couple of problems where and end user needs an authoritative answer of a name you are authoritative for, but that'll be a small percentage I expect.

Again, I haven't tested this, but I can't see any obvious reason why it wouldn't work.

If they are split then it doesn't really matter if you still host a lame
record because (since it's lame) nobody will ask you about it.

It's still cruft and ideally should still be cleaned up automatically based on the external authority changing.

Maybe. Note that the same is true of MTA and MX servers. (ie. MX record points at the same place for domains you host, as your customers do to send mail to domains you don't host).