Underscores in host names

In article <1116377042.592906.137650@g44g2000cwa.googlegroups.com> you write:

Hello all.
We have a client containing an underscore in the email address domain
name. Our email server rejects it because of it's violation of the RFC
standard. This individuals claim is that he doesn't have problems
anywhere else and if this is going to be a problem he's "going to take
his business elsewhere"!

I understand it's a violation of the standard, but does it pose a
security hole to the email server to allow this sort of mail?

Thanks

  RFC 952 and RFC 1123 describe what is currently legal
  in hostnames.

  Underscore is NOT a legal character in a hostname.

  Before anyone says that domain names allow underscore which
  they do.

  RFC 1034 Section 3.3

For hosts, the mapping depends on the existing syntax for host names
which is a subset of the usual text representation for domain names,
together with RR formats for describing host addresses, etc. Because we
need a reliable inverse mapping from address to host name, a special
mapping for addresses into the IN-ADDR.ARPA domain is also defined.

  Mail domains follow the same rules as for hostnames. RFC
  821 and its replacement RFC 2821 havn't extended the syntax
  to include underscores.

  Mark

In article <1116377042.592906.137650@g44g2000cwa.googlegroups.com> you write:

Hello all.
We have a client containing an underscore in the email address domain
name. Our email server rejects it because of it's violation of the RFC
standard. This individuals claim is that he doesn't have problems
anywhere else and if this is going to be a problem he's "going to take
his business elsewhere"!

I understand it's a violation of the standard, but does it pose a
security hole to the email server to allow this sort of mail?

No *security* hole as such, other than you need to make sure that if you're
going to accept such cruft, you make *damned* sure that you never leak it
back out and have some *other* standard-conformant site get on *your* case
about it....

Oh, and make sure that none of *your* automated tools that summarize maillogs
and the like choke on it. And that your e-mail admin is using software that
doesn't choke on it (otherwise if they send you e-mail, you can't reply.. :wink:

You may want to balance the costs of making sure that *all* your stuff is
underscore-ready (don't forget ongoing maintenance costs, as you'll probably
have to re-patch each new release of any tools) against what this customer is
willing to pay you.

One should note that COM and other tld's stopped giving out
  domains outside of LDH to prevent these sorts of interoperability
  issues. COM actually retrieved the ones they had delegated.

Those with long memories will remember when Apple got strict on this
years ago, and lots of websites became unreachable to their users...

Cheers,
-- jra

So, these are *all* non-compliant? Perhaps someone should tell them that.
Certainly would have been nice not to get spammed by them, or to have an
even easier reason to reject same.

003_150.pool-clientes.gilat.com.pe
131_202.btc-net.bg
153_199_103_66-wifi_hotspots.eng.telusmobility.com
154_ras_01.dial-ip.plugon.com.br
194_30_119_112_maca0001.lpp_za_bi.ips.sarenet.es
200.126.99.247.block7_dsl.surnet.cl
200_13_215_210.colomsat.net.co
200_63_222_138.uio.satnet.net
203_221_178_213.easynet.net.au
208_218_35_14.huntsville6.56k.cvalley.net
208_75.compnet.com.pl
212_218.bytom.compnet.com.pl
212_81_214_10_peni0000.gignu_adsl_ma_ma.ips.sarenet.es
229.usuarios_dhcp-195-219-18.gemytel.net
63_224_210_245.spkn.uswest.net
64_192_75_146.wcg.net
82_119_148_246.stv.ru
Laubervilliers-151_12-16-191.w82-127.abo.wanadoo.fr
adm_node207.ral.esu3.k12.ne.us
adsl_basico_1196-170.etb.net.co
adsl_lav178_218.datastream.com.mt
adsl_pool_20_standard93137-133.etb.net.co
adsl_pool_22_standard93139-190.etb.net.co
adsl_standard_2450-46.etb.net.co
c_178_237.tv-naruto.ne.jp
clientes_corpor_7549-2.etb.net.co
clientes_corporativos69100-82.etb.net.co
corporativo_16780-201.pool.etb.net.co.80.167.65.in-addr.arpa
customer125_200.grm.net
d7_annex_palu_a.lac.telkom.net.id
dean_rm135_2xp.business.colostate.edu
dhcp-210_169_160_191.ttn.ne.jp
dialup_67-36-145-125.ndemand.com
dsl_61_161_30_212.turbonet.com
extremo_pool_11934-63.etb.net.co
extremo_pool_11943-164.etb.net.co
h107_17.u.datacomsa.pl
hfc3-9_32.melitaonline.net
host-195_87_69_26-koc.net
host-200-75-132-202.cliente_202_net-uno.net
host85_14_64_224.galileusz.3s.pl
host_169_253.compower.pl
host_88-hra.susice-net.cz
igld-83_130_117_32.inter.net.il
igld-83_130_130_243.inter.net.il
igld-83_130_141_197.inter.net.il
ip_167_68.omni-tech.net
ip_199.directservices.com
maroochydore_client185.hypermax.net.au
neterra139_250.neterra.net
nev_dial_11.stv.ru
p165_223.knu.ac.kr
pc_163_209.smrw.lodz.pl
pool_245224-151.etb.net.co
potter_313.caasdphb.brown.edu
price3_highspeed-109.preciscom.com
ras56_196.ppppun.vsnl.net.in
red_200.32.64_customer_7.static.impsat.net.ve
red_200.41.118_cust_17.static.impsat.net.ve
sistemas__s21278-010__slv-son-001.man.newskies.net
slerpool4_69121-134.etb.net.co
slerpool5_69122-26.007mundo.com
slerpool8_93159-211.etb.net.co
sp.200_155_13_3.8x.com.br
sp.200_155_9_57.datacenter1.com.br
sp_200_219_192_94.datacenter1.com.br
st00_162.dorm.depaul.edu
sun_b035.doggy.com.au
tnt_norman_int493149-194.etb.net.co
tnt_pool_11979-199.etb.net.co
tntcuisdnixd_169106-123.007mundo.com
tntmuzuixd_169105-36.etb.net.co
tv_cable_bmga7546-72.etb.net.co
ubr2-5_38.onvol.net
user_155_208.kutztown.edu
wks_177_10.dom_bci_prod.cl
ws_541a.ff.uni-lj.si

Mark,

Grump.

I used to be in the 952/1123 sect, but I have since reformed and continue to do penance for my sins.

The "hostname is not a domain name" dodge is simply wrong. If you like, I can get a signed affadavit from the author of the DNS specifications (assuming he's in the office tomorrow) to the effect that it was always his intent that domain names be composed of any 8-bit value. That's the whole reason for length encoding the labels. RFC 2181, for all its other warts, explicitly clarified this particular issue.

The whole reason for check-names was because of very seriously broken software that would allow shell meta-characters in in-addr.arpa labels to do bad things. I have come to the opinion that if such software still exists, then the people who run that software deserve what they get. Check-names was a bad idea that might have been justified at the time, but pretending it remains justified by 952/1123 has got to stop sometime.

However, that rant was mostly irrelevant. Can you point to _ANY_ application, operating system, or anything else that has any issues whatsoever with an "_" of all characters?

Rgds,
-drc

a message of 92 lines which said:

So, these are *all* non-compliant?

Yes, and you can easily check that the FreeBSD resolver, for instance,
cannot retrieve them (the GNU libc resolver on Linux can).

notux:~ % uname
FreeBSD
notux:~ % ping Laubervilliers-151_12-16-191.w82-127.abo.wanadoo.fr
ping: cannot resolve Laubervilliers-151_12-16-191.w82-127.abo.wanadoo.fr: Unknown server error

myriam:~ % uname
Linux
myriam:~ % ping Laubervilliers-151_12-16-191.w82-127.abo.wanadoo.fr
PING Laubervilliers-151_12-16-191.w82-127.abo.wanadoo.fr (82.127.31.191) 56(84) bytes of data.
64 bytes from Laubervilliers-151_12-16-191.w82-127.abo.wanadoo.fr
(82.127.31.191): icmp_seq=1 ttl=118 time=49.0 ms

There are also mail domains to consider. They have superficially the same
syntax as host names (they cannot have a trailing dot) but they are
generally checked much more strictly for conformance to that syntax. I'm
not sure whether the original post was about a mail domain or the name of
a mail host, but if it was the former I would be surprised if the customer
could claim that it works most of the time.

Names of mail hosts are another matter, especially the names they declare
in HELO. When I analysed this back in December, I found that about 1/3 of
legitimate mail hosts declared invalid hostnames. This is orthogonal to
the issue of host name syntax, but it does show that being excessively
strict will cause you pain. However it is worth checking mail host names
for gross syntax violations since some fairly common spamware puts all
sorts of binary junk in its HELO command.

Tony.

David Conrad wrote:

I used to be in the 952/1123 sect, but I have since reformed and
continue to do penance for my sins.

Your personal pendulum has no bearing on the relevance on 952/1123.
Hostnames still have their own rules, apart from the media used to
represent those hostnames (eg, hosts or DNS is irrelevant--a hostname is
still a hostname is still a hostname).

The "hostname is not a domain name" dodge is simply wrong. If you
like, I can get a signed affadavit from the author of the DNS
specifications (assuming he's in the office tomorrow) to the effect
that it was always his intent that domain names be composed of any 8-
bit value.

There's absolutely no corrolation between those two points. Or at least,
the latter point has nothing to do with the former.

As for the latter point in particular, anybody is perfectly free to use
any 8-bit value they want for any label in any domain name, and that point
is hardly in dispute. The point for this thread however, is that 952/1123
defines its own rules for the syntax that can be used to represent a
connection target on the Internet (aka "hosts"). Those rules are quite
clear: letters, digits and hyphen only, length restrictions, etc.

However, that rant was mostly irrelevant. Can you point to _ANY_
application, operating system, or anything else that has any issues
whatsoever with an "_" of all characters?

Just one?

Squid.

Just one?

Squid.

By default Squid complains if it finds an underscore in a URL
hostname. It returns an "Invalid URL" error message and explains
that underscores are not allowed in hostnames. Of course you can
make Squid accept underscores if you prefer.

We felt this was better than returning a "the domain name does not
exist" error message. It sucks for the user when a name can be
resolved by one machine or by one application, but not by another.

Even on FreeBSD you get different answers from different apps:

         chef-wessels ~ 8> host super_bikes.tripod.com
         super_bikes.tripod.com has address 209.202.240.100
         chef-wessels ~ 9> ping super_bikes.tripod.com
         ping: cannot resolve super_bikes.tripod.com: Unknown server error

Duane W.

gethostbyaddr (and may be other functions) will return NULL under at
least FreeBSD/NetBSD for ANY PTR having the "_" character.

There must be many applications impacted.

(why are we talking about this on NANOG rather than NAMEDROPPERS?)

The whole reason for check-names was because of very seriously broken
software that would allow shell meta-characters in in-addr.arpa
labels to do bad things.

yes. mea cupla, i let CERT twist my arm into paving over a hole with
BIND that should have been patched in Sendmail.

I have come to the opinion that if such software still exists, then the
people who run that software deserve what they get.

me too.

Check-names was a bad idea that might have been justified at the time,
but pretending it remains justified by 952/1123 has got to stop sometime.

However, that rant was mostly irrelevant. Can you point to _ANY_
application, operating system, or anything else that has any issues
whatsoever with an "_" of all characters?

at the time of check-names, i outlawed _ as a side effect of punting. in
order to strip/prevent newline characters in PTR targets, i had to be able
to refer to an RFC (lest people come to me with many individual sob stories
about this or that special character that either should or should not be
stripped/prevented in gethostbyaddr().) the only RFC i found that had any
remote chance of getting me off this hook was #952. ergo, _ had to die in
order that my inbox might live.

but it was wrong, and the need for it is past, and it's time for redress.

> However, that rant was mostly irrelevant. Can you point to _ANY_
> application, operating system, or anything else that has any issues
> whatsoever with an "_" of all characters?

at the time of check-names, i outlawed _ as a side effect of punting. in
order to strip/prevent newline characters in PTR targets, i had to be able
to refer to an RFC (lest people come to me with many individual sob stories
about this or that special character that either should or should not be
stripped/prevented in gethostbyaddr().) the only RFC i found that had any
remote chance of getting me off this hook was #952. ergo, _ had to die in
order that my inbox might live.

but it was wrong, and the need for it is past, and it's time for redress.

  does this mean that i can get my .com delegation back?
  its to support ADA-act compliant web servers.

--bill

Paul Vixie wrote:

(why are we talking about this on NANOG rather than NAMEDROPPERS?)

because it's not relevant to the underlying rules

Check-names was a bad idea that might have been justified at the time,
but pretending it remains justified by 952/1123 has got to stop sometime.

at the time of check-names, i outlawed _ as a side effect of punting. in
order to strip/prevent newline characters in PTR targets, i had to be able
to refer to an RFC (lest people come to me with many individual sob stories
about this or that special character that either should or should not be
stripped/prevented in gethostbyaddr().) the only RFC i found that had any
remote chance of getting me off this hook was #952. ergo, _ had to die in
order that my inbox might live.

but it was wrong, and the need for it is past, and it's time for redress.

So, you found some pre-existing rules, used them as cover for your
problem, and now that your ~problem is fixed the pre-existing rules
shouldn't matter to anybody anymore? Come on now, isn't it slightly
possible that those rules were pre-existing for reasons that have nothing
to do with you?

Consider the code-point value of "$" as it is used in iso-646-us versus
iso-646-de or any of the other ECMA derivatives, or any of the other ISO-*
derivatives that don't have direct ASCII character mappings. That
character (and many others) can have different and distinct code-point
values in multiple character sets, but it has to be identical everywhere
in order for it to have meaning. Thus, allowing the "character" to be used
means mandating a specific code-point value for that character.
Alternatively (and what we have in the pre-existing rules) is to forbid
those characters entirely, so that nobody is forced to kautau to a
specific nationalized character set. While that may feasible in protocol
commands and such, it's not feasible to mandate that /etc/hosts MUST
always use US-ASCII code-point values for characters that may not even
exist in the local nationalized charset. Really, spend some time with the
ECMA derivative sets and you'll see what I mean--there are characters in
some of them that aren't in the others, or they are misplaced, or they are
defined as alternates, and so forth.

I'm glad you fixed your problem, but really, this isn't about DNS, it is
about universal representation of hostnames despite the media that is used
to convey those names.

but it was wrong, and the need for it is past, and it's time for redress.

So, you found some pre-existing rules, used them as cover for your
problem, and now that your ~problem is fixed the pre-existing rules
shouldn't matter to anybody anymore?

  Who said the problem is fixed?

                                       Come on now, isn't it slightly
possible that those rules were pre-existing for reasons that have nothing
to do with you?

  Man, did you get up on the wrong side of the world? Everything I've seen from you lately seems to be very acidic and bordering on intentionally insulting.

  Can we try to have a decent intelligent discussion? More importantly, can't we have this discussion in a more appropriate place?

Alternatively (and what we have in the pre-existing rules) is to forbid
those characters entirely, so that nobody is forced to kautau to a
specific nationalized character set.

  There is a solution for this problem. Use 32-bit character sets which are defined to include the entire collection of known character sets in all other languages on the planet.

  But this means you have to have a flag day, unless you can come up with some way to also be backwards compatible. And so long as you're backwards compatible, you can't get rid of the legacy problems. So, you're right back where you started.

                                       While that may feasible in protocol
commands and such, it's not feasible to mandate that /etc/hosts MUST
always use US-ASCII code-point values for characters that may not even
exist in the local nationalized charset.

  The problem is that /etc/hosts is a 30 year old solution, and we knew twenty years ago that it didn't properly solve the problem, and didn't solve it in the right way. So long as you're going to call it /etc/hosts, I don't see how you can change the character set.

                                           Really, spend some time with the
ECMA derivative sets and you'll see what I mean--there are characters in
some of them that aren't in the others, or they are misplaced, or they are
defined as alternates, and so forth.

  I live in Belgium. Been there, seen that. Exchanging one country-specific character set for another is not a solution. You need a more over-arching solution that is equally applicable everywhere.

I'm glad you fixed your problem, but really, this isn't about DNS, it is
about universal representation of hostnames despite the media that is used
to convey those names.

  A standalone machine is worthless. In fact, the definition of a truly secure machine is one that is completely isolated from every other machine on the planet. And if that machine is going to be connected to others, you have to talk about representational issues, which means the DNS.

  Like it or not, when you talk about hostnames, you must also talk about DNS.

  Now, can we please take this discussion to a more appropriate place?

So, you found some pre-existing rules, used them as cover for your
problem, and now that your ~problem is fixed the pre-existing rules
shouldn't matter to anybody anymore? Come on now, isn't it slightly
possible that those rules were pre-existing for reasons that have nothing
to do with you?

here's the stretchy part that makes me want to undo what was done.

gethostbyname() knows it's dealing with hostnames. also gethostbyaddr()
and the modern equivilents (getaddrinfo/getnameinfo/whatever). also, these
library calls can get their host name/address data from sources other than
dns. it is in my view perfectly reasonable for these library calls to
demand RFC952-compliance, or compliance with a later specification for "host"
names, if there ever is such.

however, inside BIND4 named.boot and BIND8/BIND9 named.conf you will find
that the server is capable of enforcing hostname (RFC952) and mailname (RFC821)
rules on DNS data like "owner of A RRset" or "owner or target of MX RRset",
on the very stretchy supposition that these names, because they are being
used as part of A-RR or MX-RR sets, must be getting used as "hostnames" or
"mailnames". that might often be the case, or always-to-date be the case,
but it ain't NECESSARILY the case.

putting these checks in for master zones, slave zones, and response data was
a significant over-reach on my part. THAT is what i'm apologizing for here.
(and THAT is what CERT had asked me to do, since changing gethostbyaddr()
would not, by itself, have protected Sendmail from newlines in its qf* files.)

...
I'm glad you fixed your problem, but really, this isn't about DNS, it is
about universal representation of hostnames despite the media that is used
to convey those names.

and i'd agree if you said "logic that's meant to support hostnames/mailnames
ought to enforce the known rules about those names." by which i'd be thinking
of the library calls gethostbyname(), gethostbyaddr(), and so on. and by which
i would expressly not be referring to anything in the DNS.

just because you own an A RR doesn't make you a hostname.

just because you're pointed to by an MX RR doesn't make you a mailname.

(what a relief to finally be able to say that.)

Paul Vixie wrote:

putting these checks in for master zones, slave zones, and response
data was a significant over-reach on my part. THAT is what i'm
apologizing for here. (and THAT is what CERT had asked me to do, since
changing gethostbyaddr() would not, by itself, have protected Sendmail
from newlines in its qf* files.)

Alright then. Personally I've found them useful at different times in
different places but that's some hair-splitting neither of us is
particularly interested in.

just because you own an A RR doesn't make you a hostname.

just because you're pointed to by an MX RR doesn't make you a mailname.

(what a relief to finally be able to say that.)

At the risk of hair-splitting that I've already disclaimed, I'll halfway
agree (a host that doesn't accept connections arguably isn't a host) and
halfway disagree (the target of an MX must be a valid hostname). To ensure
that this thread dies now, I'll point out that I categorized some of this
as part of my second stab at the great white whale of i18n DNS [see
http://www.ehsco.com/misc/I-Ds/draft-hall-dns-datatypes-00.txt which
ensures nobody comes back]

In article <Pine.LNX.4.60.0505180849230.24969@hermes-1.csi.cam.ac.uk> you write:

There are also mail domains to consider. They have superficially the same
syntax as host names (they cannot have a trailing dot) but they are
generally checked much more strictly for conformance to that syntax. I'm
not sure whether the original post was about a mail domain or the name of
a mail host, but if it was the former I would be surprised if the customer
could claim that it works most of the time.

  Hostnames can't have a dot at the end either. The dot at the
  end is a local resolver indication to not use the search list.

  Mark

Actually its been discussed and we may yet see trailing in mail address.

The reason why its been considered is that SMTP RFC2821 spec is flowed
as it requires at least one "." in the hostname (unlike DNS specs that
do not have this requirement for hostname) and that means that you can
not accept as valid email address something like "postmaster@tv", i.e.
if TLD is also a valid host you can not have an email address there.

Since changing SMTP2821 and waiting until everyone complies and accepts
email addresses with no "." is not an option, the solutions proposed are to either have address like "postmaster@.tv" or "postmaster@tv."

The only reason it has not been discussed more actively is that no TLD operator has yet come forward and said that they are going to use TLD host for emails, but as soon as one does this would have to be
accommodated and quickly (otherwise it will remain as an open issue for future update to SMTP - probably RFC4821 if this numbering continues :slight_smile:

<quote who="william(at)elan.net">

Since changing SMTP2821 and waiting until everyone complies and accepts
email addresses with no "." is not an option, the solutions proposed are
to either have address like "postmaster@.tv" or "postmaster@tv."

The only reason it has not been discussed more actively is that no TLD
operator has yet come forward and said that they are going to use
TLD host for emails, but as soon as one does this would have to be
accommodated and quickly (otherwise it will remain as an open issue for
future update to SMTP - probably RFC4821 if this numbering continues :slight_smile:

.ws has an MX record.
host -t mx ws. ==> mail.worldsite.ws

Most MUA's (unix ones tended to work, not surprisingly) complain or break
on "send" but technically it works. :slight_smile:

Thanks,
David Ulevitch