Weird DNS issues for domains

I'm hoping someone on the list can help confirm that I'm not going insane.

I have a customer with the domain 'mtrsd.k12.ma.us' The domain should be handled by our DNS servers (dns-auth1.crocker.com & dns-auth2.crocker.com)

The customer has an A record for www.mtrsd.k12.ma.us pointing to their web server
The customer has subdomains for each school in the district which have www records pointing to their web server via CNAME

Everything looks like it is configured properly on my servers but the customer is reporting that certain parents (VerizonDSL, Comcast, DirectWAY) can connect to certain website and not others. At this point I think the problem is with the DNS servers at their ISP.

Can someone confirm my sanity? My zone of control starts at mtrsd.k12.ma.us I do not have control over k12.ma.us

What do you all see for sanderson.mtrsd.k12.ma.us & www.sanderson.mtrsd.k12.ma.us.

; <<>> DiG 9.2.2 <<>> @204.97.12.2 mtrsd.k12.ma.us NS
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 522
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;mtrsd.k12.ma.us. IN NS

;; ANSWER SECTION:
mtrsd.k12.ma.us. 258796 IN NS dns-auth2.crocker.com.
mtrsd.k12.ma.us. 258796 IN NS dns-auth1.crocker.com.

;; Query time: 39 msec
;; SERVER: 204.97.12.2#53(204.97.12.2)
;; WHEN: Thu Sep 29 09:29:28 2005
;; MSG SIZE rcvd: 92

; <<>> DiG 9.2.2 <<>> @204.97.12.2 sanderson.mtrsd.k12.ma.us NS
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15880
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;sanderson.mtrsd.k12.ma.us. IN NS

;; ANSWER SECTION:
sanderson.mtrsd.k12.ma.us. 259200 IN NS dns-auth2.crocker.com.
sanderson.mtrsd.k12.ma.us. 259200 IN NS dns-auth1.crocker.com.

;; Query time: 2 msec
;; SERVER: 204.97.12.2#53(204.97.12.2)
;; WHEN: Thu Sep 29 09:31:15 2005
;; MSG SIZE rcvd: 102

; <<>> DiG 9.2.2 <<>> @204.97.12.2 www.sanderson.mtrsd.k12.ma.us A
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52155
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.sanderson.mtrsd.k12.ma.us. IN A

;; ANSWER SECTION:
www.sanderson.mtrsd.k12.ma.us. 86400 IN CNAME www.mtrsd.k12.ma.us.
www.mtrsd.k12.ma.us. 51 IN A 159.250.29.161

;; Query time: 48 msec
;; SERVER: 204.97.12.2#53(204.97.12.2)
;; WHEN: Thu Sep 29 09:31:52 2005
;; MSG SIZE rcvd: 81

I'm hoping someone on the list can help confirm that I'm not going
insane.

How can you be sure it's not the other way around? You're sane and
everyone else is insane? :slight_smile:

Can someone confirm my sanity? My zone of control starts at
mtrsd.k12.ma.us I do not have control over k12.ma.us

What do you all see for sanderson.mtrsd.k12.ma.us &
www.sanderson.mtrsd.k12.ma.us.

[friz@jake ~]$ dig mtrsd.k12.ma.us ns

; <<>> DiG 9.2.4 <<>> mtrsd.k12.ma.us ns
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23100
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;mtrsd.k12.ma.us. IN NS

;; ANSWER SECTION:
mtrsd.k12.ma.us. 86120 IN NS dns-auth2.crocker.com.
mtrsd.k12.ma.us. 86120 IN NS dns-auth1.crocker.com.

;; Query time: 3 msec
;; SERVER: 204.10.167.4#53(204.10.167.4)
;; WHEN: Thu Sep 29 10:38:50 2005
;; MSG SIZE rcvd: 92

[friz@jake ~]$ dig sanderson.mtrsd.k12.ma.us ns

; <<>> DiG 9.2.4 <<>> sanderson.mtrsd.k12.ma.us ns
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28515
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;sanderson.mtrsd.k12.ma.us. IN NS

;; ANSWER SECTION:
sanderson.mtrsd.k12.ma.us. 259200 IN NS dns-auth2.crocker.com.
sanderson.mtrsd.k12.ma.us. 259200 IN NS dns-auth1.crocker.com.

;; Query time: 33 msec
;; SERVER: 204.10.167.4#53(204.10.167.4)
;; WHEN: Thu Sep 29 10:39:27 2005
;; MSG SIZE rcvd: 102

[friz@jake ~]$ dig www.sanderson.mtrsd.k12.ma.us a

; <<>> DiG 9.2.4 <<>> www.sanderson.mtrsd.k12.ma.us a
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28640
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.sanderson.mtrsd.k12.ma.us. IN A

;; ANSWER SECTION:
www.sanderson.mtrsd.k12.ma.us. 86400 IN CNAME www.mtrsd.k12.ma.us.
www.mtrsd.k12.ma.us. 600 IN A 159.250.29.161

;; Query time: 64 msec
;; SERVER: 204.10.167.4#53(204.10.167.4)
;; WHEN: Thu Sep 29 10:39:33 2005
;; MSG SIZE rcvd: 81

Matthew Crocker <matthew@crocker.com> writes:

Everything looks like it is configured properly on my servers but the
customer is reporting that certain parents (VerizonDSL, Comcast,
DirectWAY) can connect to certain website and not others. At this
point I think the problem is with the DNS servers at their ISP.

Can someone confirm my sanity? My zone of control starts at
mtrsd.k12.ma.us I do not have control over k12.ma.us

I just tested it from a Verizon DSL host and it worked.

You might want to consider reading RFC 2182 though, particularly the
part about geographically diverse nameservers.

Cheers,

                                        ---Rob

For your entertainment, I'm a cox.net customer in No Va...

$ dig +trace sanderson.mtrsd.k12.ma.us ns

; <<>> DiG 9.3.1 <<>> +trace sanderson.mtrsd.k12.ma.us ns
;; global options: printcmd
. 495670 IN NS L.ROOT-SERVERS.NET.
. 495670 IN NS M.ROOT-SERVERS.NET.
. 495670 IN NS A.ROOT-SERVERS.NET.
. 495670 IN NS B.ROOT-SERVERS.NET.
. 495670 IN NS C.ROOT-SERVERS.NET.
. 495670 IN NS D.ROOT-SERVERS.NET.
. 495670 IN NS E.ROOT-SERVERS.NET.
. 495670 IN NS F.ROOT-SERVERS.NET.
. 495670 IN NS G.ROOT-SERVERS.NET.
. 495670 IN NS H.ROOT-SERVERS.NET.
. 495670 IN NS I.ROOT-SERVERS.NET.
. 495670 IN NS J.ROOT-SERVERS.NET.
. 495670 IN NS K.ROOT-SERVERS.NET.
;; Received 436 bytes from 68.100.16.25#53(68.100.16.25) in 34 ms

us. 172800 IN NS A.GTLD.BIZ.
us. 172800 IN NS B.GTLD.BIZ.
us. 172800 IN NS C.GTLD.BIZ.
;; Received 147 bytes from 198.32.64.12#53(L.ROOT-SERVERS.NET) in 129 ms

k12.ma.us. 900 IN NS NS2.PIR.NET.
k12.ma.us. 900 IN NS NS2.XCOM.NET.
k12.ma.us. 900 IN NS LNS0.MA.WORLDNAMES.NET.
k12.ma.us. 900 IN NS SIDEHACK.GWEEP.NET.
k12.ma.us. 900 IN NS NS.WPI.EDU.
k12.ma.us. 900 IN NS NS.AMARANTH.NET.
;; Received 203 bytes from 209.173.53.162#53(A.GTLD.BIZ) in 33 ms

mtrsd.k12.ma.us. 86400 IN NS dns-auth1.crocker.com.
mtrsd.k12.ma.us. 86400 IN NS dns-auth2.crocker.com.
;; Received 102 bytes from 130.64.1.31#53(NS2.PIR.NET) in 39 ms

sanderson.mtrsd.k12.ma.us. 604800 IN NS dns-auth2.crocker.com.
sanderson.mtrsd.k12.ma.us. 604800 IN NS dns-auth1.crocker.com.
;; Received 134 bytes from 204.97.12.58#53(dns-auth1.crocker.com) in 30 ms

$ dig +trace www.sanderson.mtrsd.k12.ma.us. a

; <<>> DiG 9.3.1 <<>> +trace www.sanderson.mtrsd.k12.ma.us. a
;; global options: printcmd
. 495646 IN NS D.ROOT-SERVERS.NET.
. 495646 IN NS E.ROOT-SERVERS.NET.
. 495646 IN NS F.ROOT-SERVERS.NET.
. 495646 IN NS G.ROOT-SERVERS.NET.
. 495646 IN NS H.ROOT-SERVERS.NET.
. 495646 IN NS I.ROOT-SERVERS.NET.
. 495646 IN NS J.ROOT-SERVERS.NET.
. 495646 IN NS K.ROOT-SERVERS.NET.
. 495646 IN NS L.ROOT-SERVERS.NET.
. 495646 IN NS M.ROOT-SERVERS.NET.
. 495646 IN NS A.ROOT-SERVERS.NET.
. 495646 IN NS B.ROOT-SERVERS.NET.
. 495646 IN NS C.ROOT-SERVERS.NET.
;; Received 436 bytes from 68.100.16.25#53(68.100.16.25) in 27 ms

us. 172800 IN NS A.GTLD.BIZ.
us. 172800 IN NS B.GTLD.BIZ.
us. 172800 IN NS C.GTLD.BIZ.
;; Received 151 bytes from 128.8.10.90#53(D.ROOT-SERVERS.NET) in 24 ms

k12.ma.us. 900 IN NS NS.WPI.EDU.
k12.ma.us. 900 IN NS NS.AMARANTH.NET.
k12.ma.us. 900 IN NS NS2.PIR.NET.
k12.ma.us. 900 IN NS NS2.XCOM.NET.
k12.ma.us. 900 IN NS LNS0.MA.WORLDNAMES.NET.
k12.ma.us. 900 IN NS SIDEHACK.GWEEP.NET.
;; Received 207 bytes from 209.173.53.162#53(A.GTLD.BIZ) in 32 ms

mtrsd.k12.ma.us. 86400 IN NS dns-auth2.crocker.com.
mtrsd.k12.ma.us. 86400 IN NS dns-auth1.crocker.com.
;; Received 106 bytes from 130.215.36.18#53(NS.WPI.EDU) in 36 ms

www.sanderson.mtrsd.k12.ma.us. 604800 IN CNAME www.mtrsd.k12.ma.us.
sanderson.mtrsd.k12.ma.us. 604800 IN NS dns-auth2.crocker.com.
sanderson.mtrsd.k12.ma.us. 604800 IN NS dns-auth1.crocker.com.
;; Received 156 bytes from 204.97.12.57#53(dns-auth2.crocker.com) in 42 ms

$ dig sanderson.mtrsd.k12.ma.us ns

; <<>> DiG 9.3.1 <<>> sanderson.mtrsd.k12.ma.us ns
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30698
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2

;; QUESTION SECTION:
;sanderson.mtrsd.k12.ma.us. IN NS

;; ANSWER SECTION:
sanderson.mtrsd.k12.ma.us. 604800 IN NS dns-auth1.crocker.com.
sanderson.mtrsd.k12.ma.us. 604800 IN NS dns-auth2.crocker.com.

;; ADDITIONAL SECTION:
dns-auth1.crocker.com. 151957 IN A 204.97.12.58
dns-auth2.crocker.com. 151957 IN A 204.97.12.57

;; Query time: 119 msec
;; SERVER: 68.100.16.25#53(68.100.16.25)
;; WHEN: Thu Sep 29 11:36:31 2005
;; MSG SIZE rcvd: 134

$ dig www.sanderson.mtrsd.k12.ma.us. a

; <<>> DiG 9.3.1 <<>> www.sanderson.mtrsd.k12.ma.us. a
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4869
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 2, ADDITIONAL: 2

;; QUESTION SECTION:
;www.sanderson.mtrsd.k12.ma.us. IN A

;; ANSWER SECTION:
www.sanderson.mtrsd.k12.ma.us. 604800 IN CNAME www.mtrsd.k12.ma.us.
www.mtrsd.k12.ma.us. 604800 IN A 159.250.29.161

;; AUTHORITY SECTION:
mtrsd.k12.ma.us. 604800 IN NS dns-auth1.crocker.com.
mtrsd.k12.ma.us. 604800 IN NS dns-auth2.crocker.com.

;; ADDITIONAL SECTION:
dns-auth1.crocker.com. 151950 IN A 204.97.12.58
dns-auth2.crocker.com. 151950 IN A 204.97.12.57

;; Query time: 58 msec
;; SERVER: 68.100.16.25#53(68.100.16.25)
;; WHEN: Thu Sep 29 11:36:38 2005
;; MSG SIZE rcvd: 172

whoops...sorry for the extraneous data...

I just tested it from a Verizon DSL host and it worked.

You might want to consider reading RFC 2182 though, particularly the
part about geographically diverse nameservers.

Yeah, yeah, that is overrated. If my site goes dark and my DNS goes down it doesn't really matter as the bandwidth and the web server will also be down. Having a live DNS server in another part of the country won't help if the access routers handling the traffic for the T1 to the school is also down.

Geographically diverse name servers sounds great in theory but for this application it won't gain any redundancy.

If you are talking about strictly http, then you are probably right. If you are hosting any email, then this isn't the case. A live DNS but dead mail server will cause your mail to queue up for a later resend on the originating mail servers. A dead DNS will cause the mail to bounce as undeliverable. (Oh, and if any of your subs are on mailing lists, they will be unsubscribed en masse. A nice way to challenge your call center...)

John

John Dupuy wrote:

If you are talking about strictly http, then you are probably right. If you are hosting any email, then this isn't the case. A live DNS but dead mail server will cause your mail to queue up for a later resend on the originating mail servers. A dead DNS will cause the mail to bounce as undeliverable. (Oh, and if any of your subs are on mailing lists, they will be unsubscribed en masse. A nice way to challenge your call center...)

A MTA bouncing mail on temporary DNS failure would be out of spec, horribly.

Pete

If a mail server is bouncing immediately on a DNS SERVFAIL (which is what
you'll get when a remote DNS server is down), then that mail server is badly
broken and will break quite a bit during tier1 failure situations.

Failure to resolve != resolves to NXDOMAIN/empty. A failure to resolve
(SERVFAIL) should result in the same queueing behavior that the remote SMTP
server uses for failure to establish a TCP connection.

Todd Vierling wrote:

I’ll defer to you on this. Clearly a failure to resolve is not the same thing as a NXDOMAIN RCODE.

And yet, personal experience has show that the failure of all a customer’s DNS servers for a domain does cause swifter mail bouncing than would occur otherwise. I do not know if it was due to the other providers having broken MTAs or broken DNS servers/resolvers… Or maybe they were all flukes. I now wish I had investigated them more thoroughly for the few times I’ve seen it.

John

You might want to consider reading RFC 2182 though, particularly the
part about geographically diverse nameservers.

Yeah, yeah, that is overrated. If my site goes dark and my DNS goes
down it doesn't really matter as the bandwidth and the web server
will also be down.

and folk who would otherwise spool mail for you will throw it
on the floor. enjoy.

randy

A MTA bouncing mail on temporary DNS failure would be out of spec,
horribly.

luckily no mail servers are out of spec.

randy

Matthew Crocker <matthew@crocker.com> writes:

I just tested it from a Verizon DSL host and it worked.

You might want to consider reading RFC 2182 though, particularly the
part about geographically diverse nameservers.

Yeah, yeah, that is overrated. If my site goes dark and my DNS goes
down it doesn't really matter as the bandwidth and the web server
will also be down. Having a live DNS server in another part of the
country won't help if the access routers handling the traffic for the
T1 to the school is also down.

Geographically diverse name servers sounds great in theory but for
this application it won't gain any redundancy.

I wonder what that application could be... Single server with two
addresses? Two servers behind a failing firewall? Well, if you don't
care then why should we?

There's definitely something seriously wrong with your configuration,
and it is related to the two colocated servers. I sometimes get the
result below. Works once, and then it fails because of answers from
the wrong address:

bjorn@canardo:~$ dig www.mtrsd.k12.ma.us @dns-auth1.crocker.com

; <<>> DiG 9.2.4 <<>> www.mtrsd.k12.ma.us @dns-auth1.crocker.com
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34405
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2

;; QUESTION SECTION:
;www.mtrsd.k12.ma.us. IN A

;; ANSWER SECTION:
www.mtrsd.k12.ma.us. 604800 IN A 159.250.29.161

;; AUTHORITY SECTION:
mtrsd.k12.ma.us. 604800 IN NS dns-auth2.crocker.com.
mtrsd.k12.ma.us. 604800 IN NS dns-auth1.crocker.com.

;; ADDITIONAL SECTION:
dns-auth2.crocker.com. 600 IN A 204.97.12.57
dns-auth1.crocker.com. 600 IN A 204.97.12.58

;; Query time: 279 msec
;; SERVER: 204.97.12.58#53(dns-auth1.crocker.com)
;; WHEN: Thu Sep 29 21:11:17 2005
;; MSG SIZE rcvd: 144

bjorn@canardo:~$ dig www.mtrsd.k12.ma.us @dns-auth2.crocker.com

; <<>> DiG 9.2.4 <<>> www.mtrsd.k12.ma.us @dns-auth2.crocker.com
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44398
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2

;; QUESTION SECTION:
;www.mtrsd.k12.ma.us. IN A

;; ANSWER SECTION:
www.mtrsd.k12.ma.us. 604800 IN A 159.250.29.161

;; AUTHORITY SECTION:
mtrsd.k12.ma.us. 604800 IN NS dns-auth2.crocker.com.
mtrsd.k12.ma.us. 604800 IN NS dns-auth1.crocker.com.

;; ADDITIONAL SECTION:
dns-auth2.crocker.com. 600 IN A 204.97.12.57
dns-auth1.crocker.com. 600 IN A 204.97.12.58

;; Query time: 255 msec
;; SERVER: 204.97.12.57#53(dns-auth2.crocker.com)
;; WHEN: Thu Sep 29 21:11:21 2005
;; MSG SIZE rcvd: 144

bjorn@canardo:~$ dig www.mtrsd.k12.ma.us @dns-auth1.crocker.com
;; reply from unexpected source: 204.97.12.57#53, expected 204.97.12.58#53
;; reply from unexpected source: 204.97.12.57#53, expected 204.97.12.58#53

; <<>> DiG 9.2.4 <<>> www.mtrsd.k12.ma.us @dns-auth1.crocker.com
;; global options: printcmd
;; connection timed out; no servers could be reached

After a while the session seems to time out and things will work
again. Once, before the same shit happens again.

Bj�rn

unnamed MTAs, then a simple tier-1 outage (which is not all that uncommon)
or a site under packet flood attacks would cause immediate bounces due to
DNS timeouts. The same thing applies to a site whose DNS is simply
unreachable because its link is down.

When a MTA gets a failed lookup response, it should retry. When the domain
*does* resolve, but resolves to *empty or nonexistent*, then the mail should
bounce. When a DNS server is unreachable, it can hardly return a NXDOMAIN
back to the requestor. 8-P

Matthew Crocker <matthew@crocker.com> writes:

Yeah, yeah, that is overrated. If my site goes dark and my DNS goes
down it doesn't really matter as the bandwidth and the web server
will also be down. Having a live DNS server in another part of the
country won't help if the access routers handling the traffic for the
T1 to the school is also down.

Geographically diverse name servers sounds great in theory but for
this application it won't gain any redundancy.

Whether you consider "traceroute works and I can see the packets fall
off the map at $LOCATION" better than a nameserver timeout is I
suppose a matter of personal taste.

In any event, it's my personal opinion that even if the nameservers
aren't in the same building ("geographically diverse" per the RFC)
that same prefix or even same origin AS represents a step away from
goodness. In fact, what you're seeing right now *just might* be due
to some kind of routing nastiness. The failure mode would be much
easier to talk some enduser through debugging if the domain name at
least resolved.

Me, I have nameservers in Ashburn and Palo Alto, with additional ones
coming online in London and Montreal (and maybe Tokyo) one of these
years as time permits.

Your mileage may vary, naturally; as you can see from this photograph,
I really *am* a belt-and-suspenders sort of guy:


                                        ---Rob

In article <A310E761-5459-440B-BA92-E160A45550AB@crocker.com> you write:

I just tested it from a Verizon DSL host and it worked.

You might want to consider reading RFC 2182 though, particularly the
part about geographically diverse nameservers.

Yeah, yeah, that is overrated. If my site goes dark and my DNS goes
down it doesn't really matter as the bandwidth and the web server
will also be down. Having a live DNS server in another part of the
country won't help if the access routers handling the traffic for the
T1 to the school is also down.

Geographically diverse name servers sounds great in theory but for
this application it won't gain any redundancy.

  People say this but then they don't see the impact of not
  having DNS servers available.

  The DNS was designed with the idea that atleast one of the
  nameservers for a zone would always be reachable. A zone
  that is unreachable results in the caching servers using
  up resouces at 1000 times the normal rate. Milli-seconds
  to tens of seconds.

  Mark

[...]

The problem I've seen is when an SMTP server does not accept emails
which have non-resolvable MAIL FROM domain. When the sender is a
dumb SMTP client, not an MTA, this can cause problems.

Well, that "dumb SMTP client" should stop pretending to be a MTA then.
If it can't queue and retry, it shouldn't even *think* about looking
for MX records.

Besides, what sort of "dumb SMTP client" did you have in mind?
Formmail scripts? Worms? Outlook Express? I can't say I'd miss mail
from any of those.

(I noticed this happen to a high traffic customer who had both of
their DNS servers in the same /24 located in Slidell, LA. Needless
to say, they were down for more than a few hours when Katrina rolled
through.)

Having reachable DNS isn't going to help anyway if the MX host is also
unreachable for an extended period. Mail is still going to bounce
after a few days if somebody doesn't fiddle with DNS.

Peter wrote:

[...]

The problem I've seen is when an SMTP server does not accept emails
which have non-resolvable MAIL FROM domain. When the sender is a
dumb SMTP client, not an MTA, this can cause problems.

Well, that "dumb SMTP client" should stop pretending to be a MTA then.
If it can't queue and retry, it shouldn't even *think* about looking
for MX records.

Sorry, I guess I was not clear. The dumb client is not pretending
to be an MTA. The dumb client is sending to its "smart host." The
MTA, the smart server for the dumb clients, does a "reality check"
on the envelope sender. (This is not unusual.) A dumb client tries
to send,

  MAIL FROM:<joebillybob@down-dns.org>

Via the MTA, but the MTA rejects this because it cannot resolve the
domain. Now even if our MTA does the right thing and rejects with
a 4xx error, a dumb client may not be equipped to handle this well.

Besides, what sort of "dumb SMTP client" did you have in mind?
Formmail scripts? Worms? Outlook Express? I can't say I'd miss mail
from any of those.

Well, the reality check on the sender domain is meant to stop a lot
of traffic from some of those sources, so I won't miss that either.
However, due to the nature of our business, we have lots of people
with very, uh, "interesting" SMTP clients. I know of a few who have
integrated PPP/IP/TCP/SMTP stacks for custom hardware, i.e. they wrote
network code for a device with less CPU and RAM horsepower than your
modern wrist watch to only send email. They tend not to handle
exceptional conditions well (and sometimes have cool features like
the sender address is hardcoded, hardcoded in NVRAM, or hardcode the
IP address of the smart host which is fun when we move those or bring
one down for maintenance).

(I noticed this happen to a high traffic customer who had both of
their DNS servers in the same /24 located in Slidell, LA. Needless
to say, they were down for more than a few hours when Katrina rolled
through.)

Having reachable DNS isn't going to help anyway if the MX host is also
unreachable for an extended period. Mail is still going to bounce
after a few days if somebody doesn't fiddle with DNS.

But even if the destination MTA is reachable, the mail was not going
through since the MAIL FROM domain was unresolvable. The mail would
have been delivered promptly had the sender's DNS been available. The
sender's MX MTA never enters into the picture.