Is Hotmail in the habit of ignoring MX records?

One of my users has reported incoming mail failures, which I finally
tracked down. It turned out that Hotmail has seen fit to send the mail
to his domain's A record machine, despite the fact that he has valid MX records.

The A record points to my webserver, which does not normally accept mail
for anyone. The mail server MX records are to an entirely different machine.

Comments?

Do I need more valium?

-=[L]=-

If the MX records are not responsive / timing out, they might be falling
back to the A record.

You looked in the mail headers and saw hotmail's mail server do that,
or the From address/return path just happens to be hotmail?
I would ask for a specific example of a domain name in which that
seems to happen, and exact DNS zone contents.

I am sure that Hotmail does not ignore MX in general, unless they
just broke something; many domains require MX processing and A record
to properly be ignored for mail to be accepted. But there may be
something else going on with a specific domain or DNS
queries/responses from its nameservers, that results in MX being
ignored or unavailable, resulting in a fallback to 'lookup A'.

An example could be some dns issue such as slow response to MX query,
'MX to a CNAME', 'MX to an invalid label that looks like an IP', MX
DNS response packet too large,
....

> One of my users has reported incoming mail failures, which I finally
> tracked down. It turned out that Hotmail has seen fit to send the mail
> to his domain's A record machine, despite the fact that he has valid MX
> records.

You looked in the mail headers and saw hotmail's mail server do that,
or the From address/return path just happens to be hotmail?
I would ask for a specific example of a domain name in which that
seems to happen, and exact DNS zone contents.

I am sure that Hotmail does not ignore MX in general, unless they
just broke something; many domains require MX processing and A record
to properly be ignored for mail to be accepted. But there may be
something else going on with a specific domain or DNS
queries/responses from its nameservers, that results in MX being
ignored or unavailable, resulting in a fallback to 'lookup A'.

An example could be some dns issue such as slow response to MX query,
'MX to a CNAME', 'MX to an invalid label that looks like an IP', MX
DNS response packet too large,
....

--
-JH

Unfortunately, all I get from my user is a snippet, and it took me a while
to realize that I had to look at the mail logs of my web server, not my
mail server, to find the transaction. The domain is cookephoto.com - and
here is my zone file:

plaid# dig cookephoto.com any

; <<>> DiG 9.3.3 <<>> cookephoto.com any
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55698
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 8, AUTHORITY: 0, ADDITIONAL: 8

;; QUESTION SECTION:
;cookephoto.com. IN ANY

;; ANSWER SECTION:
cookephoto.com. 172800 IN SOA ns.metron.com. hostmeister.metron.com. 2012011900 21600 3600 345600 345600
cookephoto.com. 172800 IN NS ns2.metron.com.
cookephoto.com. 172800 IN NS ns1.metron.com.
cookephoto.com. 172800 IN NS ns3.metron.com.
cookephoto.com. 172800 IN MX 12 mail2.metron.com.
cookephoto.com. 172800 IN MX 15 mail.katz.com.
cookephoto.com. 172800 IN MX 10 mail.metron.com.
cookephoto.com. 172800 IN A 192.160.193.89

;; ADDITIONAL SECTION:
ns1.metron.com. 3600 IN A 192.160.193.34
ns2.metron.com. 3600 IN A 209.204.189.89
ns2.metron.com. 3600 IN AAAA 2001:470:838d::89
ns3.metron.com. 3600 IN A 192.160.193.55
ns3.metron.com. 3600 IN AAAA 2001:470:838d::55
mail.metron.com. 3600 IN A 192.160.193.14
mail2.metron.com. 3600 IN A 209.204.189.91
mail.katz.com. 28800 IN A 192.160.193.14

and here is the maillog for the transaction, slightly redacted:

Jul 25 13:13:07 plaid sm-mta[5121]: NOQUEUE: connect from blu0-omc2-s2.blu0.hotmail.com [65.55.111.77]
Jul 25 13:13:07 plaid sm-mta[5121]: q6PKD7bH005121: --- 220 plaid.metron.com ESMTP Sendmail 8.13.8/8.13.8; Wed, 25 Jul 2012 13:13:07 -0700 (PDT)
Jul 25 13:13:07 plaid sm-mta[5121]: q6PKD7bH005121: <-- EHLO blu0-omc2-s2.blu0.hotmail.com
Jul 25 13:13:07 plaid sm-mta[5121]: q6PKD7bH005121: --- 250-plaid.metron.com Hello blu0-omc2-s2.blu0.hotmail.com [65.55.111.77], pleased to meet you

No, they do. The exact same thing has happened to me - twice, with two
seperate scenarios being fundamentally similar. The MX is ignored, the
non-host A record is tried, if it accepts connections on Port 25 it uses
this instead.
This behavior forced me to set up the mail server on the same box as a
webserver I administer to act as a secondary MX for another domain I
administer (mail is elsewhere), in one case.
In the other, I had to simply write off the option of having
http://domain working, and live with just http://www.domain, due to the
use of a third party web host that also had an MTA on their machine that
was rejecting my email.

Like all the behemoth service providers, it's impossible to find someone
useful to talk to about these things. I posted on Mailop about it a few
months ago, but it's not new behavior - the first instance I came across
was more than 2 years ago.

Mark.

The domain is cookephoto.com

Why does mail.metron.com have MX records?
And they're different.

  $ host cookephoto.com
  cookephoto.com has address 192.160.193.89
  cookephoto.com mail is handled by 10 mail.metron.com.
  cookephoto.com mail is handled by 12 mail2.metron.com.
  cookephoto.com mail is handled by 15 mail.katz.com.

  $ host mail.metron.com
  mail.metron.com has address 192.160.193.14
  mail.metron.com mail is handled by 10 mail.metron.com.
  mail.metron.com mail is handled by 20 mail.katz.com.

  $ host mail.katz.com
  mail.katz.com has address 192.160.193.14

  $ host mail2.metron.com
  mail2.metron.com has address 209.204.189.91

  $ host plaid.metron.com
  plaid.metron.com has address 192.160.193.135

Normally, in my experience, the actual mail server doesn't have MX records as such, but….
Just seems 0dd.

Also, you say …

At the time of the transaction, nothing special was happening here, ...

Was anything strange happening with any of the DNS records for any of these domains in the past two days?

Aloha,
Michael.

One of my users has reported incoming mail failures, which I finally
tracked down. It turned out that Hotmail has seen fit to send the mail
to his domain's A record machine, despite the fact that he has valid MX records.

The A record points to my webserver, which does not normally accept mail
for anyone. The mail server MX records are to an entirely different machine.

Comments?

Do I need more valium?

If you subscribe to http://mailop.org and look in the archives, you'll see a thread named '[mailop] Hotmail ignoring MX, going direct to @ IN A? ' from March of this year (which carries over into April). In this thread Mark Foster encounters the same issue, and upon investigation others (including myself) see it as well.

I found that we were having the same issue after users on Hotmail were forwarding us DSNs regarding messages that our mail server had never seen, however upon checking our web servers for that hostname we found connections and delivery attempts from Hotmail.

Additionally, quoted from Tony Finch in the mailop thread regarding 'what if your MXes are broken and it is just failing back to A':

   If one or more MX RRs are found for a given name, SMTP systems MUST
   NOT utilize any address RRs associated with that name unless they are
   located using the MX RRs; the "implicit MX" rule above applies only
   if there are no MX records present. If MX records are present, but
   none of them are usable, this situation MUST be reported as an error.

No solution to the issue was found in the various forks of that thread, however one individual afflicted by this issue (the OP) seems to have resolved his specific issue with Hotmail by fixing his MX records to be in stricter compliance with RFCs and best practices (removed a CNAME) - that said, per the quote above Hotmail should not have been falling back to the A records or any other RRs for the hostname.

The matter is still unresolved for us and presumably others on the list except for the OP

If the MX records are not responsive / timing out, they might be falling
back to the A record.

Per RFC2821 (and later RFC5321):

   If one or more MX RRs are found for a given name, SMTP systems MUST
   NOT utilize any address RRs associated with that name unless they are
   located using the MX RRs; the "implicit MX" rule above applies only
   if there are no MX records present. If MX records are present, but
   none of them are usable, this situation MUST be reported as an error.

So while it is possible they are doing this, they should not be

Ryan

From: Ryan Rawdon
Sent: Thursday, July 26, 2012 7:06 AM
To: nanog@nanog.org
Subject: Re: Is Hotmail in the habit of ignoring MX records?

No solution to the issue was found in the various forks of that thread,
however one individual afflicted by this issue (the OP) seems to have
resolved his specific issue with Hotmail by fixing his MX records to be
in stricter compliance with RFCs and best practices (removed a CNAME) -
that said, per the quote above Hotmail should not have been falling
back to the A records or any other RRs for the hostname.

I would say MX pointing to a CNAME instead of pointing to an A record is the #1 cause of intermittent mail delivery problems I have seen. Some MTAs seem to tolerate it, some don't.

G

Ahh - I knew I had seen this before, but thought it was here (nanog) rather
than on mailops. I think I may try setting the A record for the domain to
my mailserver, and letting the webserver there redirect the http requests.
I dislike putting a webserver on the unadorned domain, but out there in the
'real' world, folks seem to have become accustomed to leaving off the 'www'.

Thanks for the replies; I'll take this over to mailops if there is any more
to say. The funny thing is that this behavior with respect to Hotmail has not
affected any of the other couple of dozen domains with similar or identical
configurations here.

Oh, well.

-=[L]=-

In message <A9A5C64B-831D-42BF-8A38-56CC3B9BAF48@kapu.net>, Michael J Wise writ
es:

> The domain is cookephoto.com

Why does mail.metron.com have MX records?

Why do you care? There is nothing wrong with having explict MX
records and they generally take up less room in a DNS cache then
the negative response does especially if it is DNSSEC signed.

And they're different.

Again why do you care?

  $ host cookephoto.com
  cookephoto.com has address 192.160.193.89
  cookephoto.com mail is handled by 10 mail.metron.com.
  cookephoto.com mail is handled by 12 mail2.metron.com.
  cookephoto.com mail is handled by 15 mail.katz.com.

  $ host mail.metron.com
  mail.metron.com has address 192.160.193.14
  mail.metron.com mail is handled by 10 mail.metron.com.
  mail.metron.com mail is handled by 20 mail.katz.com.

  $ host mail.katz.com
  mail.katz.com has address 192.160.193.14

  $ host mail2.metron.com
  mail2.metron.com has address 209.204.189.91

  $ host plaid.metron.com
  plaid.metron.com has address 192.160.193.135

Normally, in my experience, the actual mail server doesn't have MX
records as such, but=85.
Just seems 0dd.

All address record (A and AAAAA) have MX records. Some may be
implicit but as far as SMTP is concerned they all have MX records.

In message <A9A5C64B-831D-42BF-8A38-56CC3B9BAF48@kapu.net>, Michael J Wise writ
es:

The domain is cookephoto.com

Why does mail.metron.com have MX records?

Why do you care? There is nothing wrong with having explict MX
records and they generally take up less room in a DNS cache then
the negative response does especially if it is DNSSEC signed.

And they're different.

Again why do you care?

Why do *I* care?
I don't.

I'm just trying to find the weird bit that maybe is causing hotmail to stumble.
And maybe an endless loop for an MX lookup might be what is causing hotmail to panic and throw out the MX records.

  $ host cookephoto.com
  cookephoto.com has address 192.160.193.89
  cookephoto.com mail is handled by 10 mail.metron.com.
  cookephoto.com mail is handled by 12 mail2.metron.com.
  cookephoto.com mail is handled by 15 mail.katz.com.

  $ host mail.metron.com
  mail.metron.com has address 192.160.193.14
  mail.metron.com mail is handled by 10 mail.metron.com.
  mail.metron.com mail is handled by 20 mail.katz.com.

  $ host mail.katz.com
  mail.katz.com has address 192.160.193.14

  $ host mail2.metron.com
  mail2.metron.com has address 209.204.189.91

  $ host plaid.metron.com
  plaid.metron.com has address 192.160.193.135

Normally, in my experience, the actual mail server doesn't have MX
records as such, but=85.
Just seems 0dd.

All address record (A and AAAAA) have MX records. Some may be
implicit but as far as SMTP is concerned they all have MX records.

Also, you say =85

At the time of the transaction, nothing special was happening here,

...

Was anything strange happening with any of the DNS records for any of
these domains in the past two days?

Aloha,
Michael.
--
"Please have your Internet License
and Usenet Registration handy..."

--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka@isc.org

Aloha,
Michael.

In message <B59A4092-CE2F-44E4-84F9-77C18493AD95@kapu.net>, Michael J Wise writ
es:

> In message <A9A5C64B-831D-42BF-8A38-56CC3B9BAF48@kapu.net>, Michael J =
Wise writ
> es:
>>=20
>>=20
>>> The domain is cookephoto.com
>>=20
>> Why does mail.metron.com have MX records?
>=20
> Why do you care? There is nothing wrong with having explict MX
> records and they generally take up less room in a DNS cache then
> the negative response does especially if it is DNSSEC signed.
>=20
>> And they're different.
>=20
> Again why do you care?

Why do *I* care?
I don't.

I'm just trying to find the weird bit that maybe is causing hotmail to =
stumble.
And maybe an endless loop for an MX lookup might be what is causing =
hotmail to panic and throw out the MX records.

You don't lookup MX records for MX targets. This is basic MTA
processing.

If the MX lookup fails, as apposed to returns nodata, you don't
lookup the A/AAAA records and synthesis a MX record. You treat it
as a soft error and queue for retry later. Again this is basic MTA
processing.

You don't depend on ALL (ANY) returning MX records as they may not
be in the cache. You need to make a explict MX query you get no
MX records are returned in response to a ALL query.

Mark

And yet, Hotmail apparently is doing the exact opposite of that. Which
means what 'should' happen or what 'should' be done isn't as relevant as we
would all it to be. Given this, considering "unusual" things like the
target of an MX record having an MX record it - whilst completely
irrelevant for a well-behaved mail server - might actually be relevant
here...

  Scott.

That would be a seriously broken violation of the SMTP specification.

Tony.

That would be a seriously broken violation of the SMTP specification.

I would definitely agree it would be quite broken behavior, but you
know, I never said Hotmail's processing wasn't broken -- only that
they seem to honor MX records in the common case. If you are doing
something unusual like "mail MX bla bla"

I would say you can't rule that out as a possible cause, just because
some RFC suggests it should be OK.

The spec does say that you're not allowed to chain MX records. But
i'm not so sure that the specification actually prohibits a SMTP
server from doing that, if someone
does try to chain MX records.

it may also be out of spec to have a "MX" record point to a
dns label that a MX record exists for in the first place.

That would be a seriously broken violation of the SMTP specification.

I would definitely agree it would be quite broken behavior, but you
know, I never said Hotmail's processing wasn't broken -- only that
they seem to honor MX records in the common case. If you are doing
something unusual like "mail MX bla bla"

I would say you can't rule that out as a possible cause, just because
some RFC suggests it should be OK.

The spec does say that you're not allowed to chain MX records. But
i'm not so sure that the specification actually prohibits a SMTP
server from doing that, if someone
does try to chain MX records.

it may also be out of spec to have a "MX" record point to a
dns label that a MX record exists for in the first place.

MX records don't "chain".

If they did, then

example.com. 1800 IN MX 10 example.com.

would be an infinite loop. This isn't an infinite loop and is instead a
perfectly valid configuration.

If you made a DNS query for the MX records for example.com you would get
back an answer that might include:

;; ANSWER SECTION:
example.com. 1800 IN MX 10 example.com.

;; ADDITIONAL SECTION:
example.com. 1800 IN A 10.10.10.10

From RFC 1035:

3.3.9. MX RDATA format

But they do, "Expand".
And I can think of a way whereby if an MX record referenced itself, *AND* included something extra … (did you see the something extra?)

That it would be possible (and I'm not saying this is what is happening, but … it could be) …
That an internal process could go resolving MX records, and adds them all to an internal table, until it figures it's got 'em all…

  "Gotta Get 'Em All!"

… and maybe, just maybe … it exhausts the table space, and gives up, and tries the A record.

I'm not saying this would be "Standard".
I'm not saying this is the best, or perhaps even an acceptable way to do it.
Or that it is in fact what is happening.

But the config looked weird, and I can imagine … a system being written as described … and breaking just this way given that MX configuration.
I can imagine Test … not catching it.

Aloha,
Michael.

In message <25F0B21A-0319-45E3-9DBF-9906CB77AC6C@kapu.net>, Michael J Wise writ
es:

> MX records don't "chain".

But they do, "Expand".
And I can think of a way whereby if an MX record referenced itself, =
*AND* included something extra =85 (did you see the something extra?)

That it would be possible (and I'm not saying this is what is happening, =
but =85 it could be) =85
That an internal process could go resolving MX records, and adds them =
all to an internal table, until it figures it's got 'em all=85

  "Gotta Get 'Em All!"

=85 and maybe, just maybe =85 it exhausts the table space, and gives up, =
and tries the A record.

I'm not saying this would be "Standard".

It would be broken. MX records say which machines are set up to receive
email for a domain. Delivering it elsewhere, unless explicitly overridden
(e.g. smarthost), is a security flaw in the MTA.

I'm not disputing it.
I'm also not saying it is, or it isn't, because I don't know.
What I am saying is, what I do know is, that you probably can't open a Sev A DCR ticket with HotMail, and neither can I.

That, and … it would seem there may be two things broken.
And that fixing the MX "recursion" may re-cloak the apparent bug in HotMail.
Maybe.

Which one can be fixed faster?

Aloha,
Michael.