Detecting a non-existent domain

Getting practical for a minute. What is the optimal way now to see if a given host truly exists? Assume that I can't control the DNS server--I need to have this code run in any (*ix) environment. Assume also that I don't want to run around specialcasing specific IP addresses or TLDs--this needs to work reliably no matter what the domain. User gives me a string, and I need to see if the given host is a real machine.

An answer from Verisign would be most appropriate here, since they have done "extensive research" on the impact of their new service, so presumably they figured out the answer to this problem and have code samples available for distribution. However I get the feeling from their press releases that they've forgotten there is more to the internet than just the web.

Look for a SOA record for the domain - this should be the proper way to
check for the existance of a domain, instead of looking for A, NS or MX
records..

  box:~# dig assvsvsdcacdasc.com SOA
  ; <<>> DiG 9.1.3 <<>> assvsvsdcacdasc.com SOA
  ;; global options: printcmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 47940
  ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

  ;; QUESTION SECTION:
  ;assvsvsdcacdasc.com. IN SOA

  ;; AUTHORITY SECTION:
  com. 10800 IN SOA a.gtld-servers.net. nstld.verisign-grs.com. 2003092300 1800 900 604800 86400

  box:~# dig yahoo.com SOA
  ; <<>> DiG 9.1.3 <<>> yahoo.com SOA
  ;; global options: printcmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50827
  ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 5, ADDITIONAL: 0

  ;; QUESTION SECTION:
  ;yahoo.com. IN SOA

  ;; ANSWER SECTION:
  yahoo.com. 1800 IN SOA hidden-master.yahoo.com. hostmaster.yahoo-inc.com. 2003092304 900 300 604800 600

Note the difference in QUERY and ANSWER numbers for the two...

- d.

Kee Hinckley wrote:

Getting practical for a minute. What is the optimal way now to see if a given host truly exists? Assume that I can't control the DNS server--I need to have this code run in any (*ix) environment. Assume also that I don't want to run around specialcasing specific IP addresses or TLDs--this needs to work reliably no matter what the domain. User gives me a string, and I need to see if the given host is a real machine.

A set comparison between the domain your interested in and *.TLD will inform you if the domain is pointed to the same IP addresses as the wildcard. In many cases, this is sufficient and can be made to work dynamically and quickly with most software and scripts.

-Jack

He asked for the "optimal way" "to see if a given host truly exists" and
you told him how to confirm or deny the "existance of a domain". He asked
about hosts, you answered about domains.

  DS

No, because there doesn't _have_ to be a SOA RR for a 2nd level
domain. For example, in the .de TLD, there are (many) domains
which have their RRs (A, MX) in the de. zone file, NOT being
delegated to another NS RRset.

Example: fsck.de

Regards,
Daniel

Getting practical for a minute. What is the optimal way now to see
if a given host truly exists?

  You first have to define what you mean by 'exists'. I have a machine here
that I call 'stinky'. It's not on the Interent though. Does the 'host'
'stinky' exist?

Assume that I can't control the DNS
server--I need to have this code run in any (*ix) environment.
Assume also that I don't want to run around specialcasing specific IP
addresses or TLDs--this needs to work reliably no matter what the
domain. User gives me a string, and I need to see if the given host
is a real machine.

  How would you do this before? Does an A record for a hostname mean that a
host with that name exists? If so, then all *.com 'hosts' now 'exist'. If
not, what did you mean by exist before?

An answer from Verisign would be most appropriate here, since they
have done "extensive research" on the impact of their new service, so
presumably they figured out the answer to this problem and have code
samples available for distribution. However I get the feeling from
their press releases that they've forgotten there is more to the
internet than just the web.

  Forgive me for defending Verisign, but if you want to know if a given DNS
name corresponds with an A record, you can still determine that. If you want
to determine something else, you can still do that, depending upon what that
something else is.

  As for 'fsck.de', a good argument can be made that this is not really a
legal domain. It's a host. Checking for an SOA is a good way to tell if a
domain is valid, depending upon what you mean by 'domain' and 'valid'.

  I'm reminded of the classic programmers question, "how do you tell if a
machine is online?". The answer is "define what you mean by a machine being
online and test for that".

  So you aren't asking a comprehensible question yet.

  DS

  As for 'fsck.de', a good argument can be made that this is not really a
legal domain.

It's a perfectly valid domain registered with DE-NIC. DE-NIC offers two
types of domains: delegated and so-called "MX-only" domains, where up
to five (IIRC) RRs reside directly in the TLD zone file. Do a whois
lookup on whois.denic.de for fsck.de to see how this looks like.

It's a host. Checking for an SOA is a good way to tell if a
domain is valid, depending upon what you mean by 'domain' and 'valid'.

Your definition of "domain" is too narrow. A "valid domain" in common
context is a registered second level DNS label. It has no implication
on what is technically being done with this label. Some NICs like
DE-NIC impose restrictions upon registration though (domain has to
be delegated to a working and configured set of NSses or be a "MX-only"
domain).

Regards,
Daniel

Are you sure you are not confusing the terms "domain" and "zone"?

In fairness I was ambiguous. Although Verisign ought to be describing all of these techniques. But depending on the circumstances I primarily need to check for A records or MX and A records.

Lee Hinckley wrote:

> He asked for the "optimal way" "to see if a given host truly
>exists" and
>you told him how to confirm or deny the "existance of a domain". He asked
>about hosts, you answered about domains.

In fairness I was ambiguous. Although Verisign ought to be
describing all of these techniques. But depending on the
circumstances I primarily need to check for A records or MX and A
records.

  I was both confused and confusing myself. Previously, you could consider
any response other than 'NXDOMAIN' to an 'ANY' query in the COM/NET domain
to mean that the domain was registered, at least. Now you can't.

  There was no way to use DNS to tell for sure that a domain is available.
There still isn't. There was no way to use DNS to tell for sure that a
domain is registered. There still isn't.

  If we're going to challenge Verisign, we have to make sure we ask the right
questions. One question to ask them is how are we supposed to tell whether
an A record was placed by the owner of the domain. If we don't agree to
SiteFinder's terms, how do we avoid using it? How do we use DNS for
commercial purposes? And so on.

  DS

Okay, let's be very specific. I need to know if a given name has either A or MX records which are *not* the same as those provided by the a wildcard in the appropriate TLD.

The answer so far seems to be to query *.TLD, nab all the records, and then compare them all the results you get back from querying the domain. If there is anything that doesn't match, you are in the clear. (Modulo internal networks and localhost and all those fun tricks of course--but that's a different problem.)

The fact that this is a single IP comparison with Verisign today presumably does not preclude the wonders of MX records, CNAME's, multiple A records and all of that in the future.

Okay, but what does that have to do with Verisign? That question would
apply to .museum as well, for example. It also applies equally to
lower-level domain. It's not Verisign's fault that DNS doesn't provide any
'wildcard' flag in its responses.

  If we're going to challenge Verisign, we should be very careful to
precisely phrase our queries so that Verisign has no wiggle room. For
example, I suggest, "If I don't like SiteFinder, don't agree to its terms,
and don't want to use it, or if I can't comply with its terms because my
usage is commercial, how do I avoid it?"

  DS

That seems like it would work as well. In my case I need to make use of the A and MX records for other things anyway, so I might as well go that path. I'd need to sit down and see which mechanism uses the least queries.

I'm concerned though that all these mechanisms could fall apart if Verisign decided to start using a third-party content provider to distribute the load on their server.

The answer so far seems to be to query *.TLD, nab all the records,
and then compare them all the results you get back from querying the
domain. If there is anything that doesn't match, you are in the
clear. (Modulo internal networks and localhost and all those fun
tricks of course--but that's a different problem.)

The fact that this is a single IP comparison with Verisign today
presumably does not preclude the wonders of MX records, CNAME's,
multiple A records and all of that in the future.

Alg 101

1. Seed the isWildCard probability array.

Generate N random strings. Attach ".NET" or ".COM" to them. Get records for
them. Compare records to each other assigning them probability of being a
wildcard based on the repetitiveness of the data.

2. Query domain name in question.

Compare the result with isWildCard probability array.

Alex