COM/NET informational message

-----BEGIN PGP SIGNED MESSAGE-----

This message explains an upcoming change in certain behavior of the
com and net authoritative name servers related to internationalized
domain names (IDNs).

VeriSign Global Registry Services (VGRS) has been a longtime advocate
of IDNs. Our IDN Test Bed has been active for over two years and we
have followed and supported IETF developments in the IDN area. The
protocol for IDNs developed by the IETF's IDN Working Group has been
approved by the IESG and we anticipate that RFCs will be published
soon. That protocol, Internationalizing Domain Names in Applications
(IDNA), calls for changes to individual applications to support IDNs.
VGRS has developed a plug-in, called i-Nav, for Microsoft's Internet
Explorer browser to support IDNs in a manner consistent with IDNA.
i-Nav is free and more information about it is available at
<<http://www.idnnow.com>>.

Before IDNA, some application developers had developed proprietary
mechanisms designed to support IDNs. The Internet Explorer browser,
for example, sends a DNS query in UTF-8 or another, local encoding
when a user types a domain name with characters other than letters,
digits and the hypen in the address bar. These efforts, however, were
not entirely successful. For example, if such a domain name ends in
com or net these queries reach the com/net name servers and fail.

Our research indicates that the average user expects IDNs to work but
does not understand the need for additional software to support this
functionality. Such users attempt to enter IDNs in their browsers,
but when the queries fail, they become frustrated and do not know
what action to take to enable IDNs. They are unaware that downloading
a browser plug-in such as i-Nav would enable IDN resolution.
To improve this user experience and to encourage the adoption of an
application that supports IDNA, VGRS is announcing a measure intended
to stimulate widespread distribution of the i-Nav plug-in. Starting
on January 3, 2003, some queries to the com/net name servers that
previously failed with a DNS Name Error (NXDOMAIN) response will
instead return an address (A) record. Any queries for A records with
at least one octet greater than decimal 127 in the second-level label
will trigger this A record response. For example, a query for the A
record for "foo?.com", where "?" represents an octet with a value
greater than 127, would return an A record rather than NXDOMAIN
response. The goal is to match unrecognized domain names generated by
browsers attempting to resolve IDNs. Since browsers construct DNS
queries for such IDNs using UTF-8 or a local encoding, and since
these encodings use octets with all possible values (i.e., from 0
through 255), the presence of octets with values greater than 127 as
described above can indicate a web browser's failed IDN resolution
attempt.

The A record that will be returned by VGRS points to a farm of web
servers that will attempt to resolve the query. The browser that sent
the original DNS query will connect to one of these web servers and
its HTTP request will contain a Host header with the representation
of the IDN originally entered by the user in the address bar. The web
servers will attempt to interpret the contents of the Host header. If
the Host header corresponds to an IDN registered in VeriSign's IDN
Test Bed, the web server will return a page that gives the user an
opportunity to download the free i-Nav plug-in. The page will also
allow the user to navigate to the corresponding IDN web site via an
HTTP redirect. If the contents of the Host header cannot be matched
to an IDN registered in the Testbed, the web server will return an
HTTP 404 response.

If a user downloads and installs the i-Nav plug-in, his or her
browser will convert any IDNs entered to ASCII compatible encoding
(ACE) format, according to the method described in IDNA. As a result,
subsequent DNS queries will use ASCII characters only.
The user experience for web browsing will change only slightly from
the current experience if the contents of the Host header cannot be
interpreted. If the web farm cannot match the Host header to an IDN,
the user will see an error page resulting from the HTTP 404 error
returned, rather than an error page resulting from a DNS NXDOMAIN
response. The web servers refuse connections on all other UDP and TCP
ports, so other network services are minimally affected.

The overriding goal is to improve Internet navigation by encouraging
widespread adoption of software supporting the emerging IETF
standards for IDNs. These measures allow distribution of such
software.

- --------
Brad Verd
Resolution Systems Operations Manager
VeriSign Global Registry Services
<<http://www.verisign-grs.com>>
Email: bverd@verisign.com
- --------

Date: Fri, 3 Jan 2003 12:49:06 -0500
From: "Verd, Brad"

[ At the risk of going OT... ]

Before IDNA, some application developers had developed
proprietary mechanisms designed to support IDNs. The Internet

UTF-8 is a standard. MS products have used two-octet chars to
support Unicode for a long time. Any reason to add yet another
encoding?

The A record that will be returned by VGRS points to a farm
of web servers that will attempt to resolve the query.

Going to proxy SMTP as well?

If a user downloads and installs the i-Nav plug-in, his or
her browser will convert any IDNs entered to ASCII compatible
encoding (ACE) format, according to the method described in
IDNA. As a result, subsequent DNS queries will use ASCII
characters only.

Why? Programmers already are (or should be) supporting UTF-8.
Searching RFC1035 for "binary" indicates a nameserver should be
able to handle chars >= 0x80. All that's left is deciding on an
encoding and handling case.

The web servers refuse connections on all other UDP and TCP
ports, so other network services are minimally affected.

Uhhhh.... more like the ugly kludge only addresses HTTP, and
other network services just won't work.

The overriding goal is to improve Internet navigation by
encouraging widespread adoption of software supporting the
emerging IETF standards for IDNs. These measures allow
distribution of such software.

How about encouraging widespread adoption of EXISTING standards
instead of adding more cruft? UTF-8 is standard. Proper DNS
implementations are eight-bit safe. People upgraded browsers
due to SSL, Year 2000, Javascript...

Eddy

UTF-8 is a standard. MS products have used two-octet chars to
support Unicode for a long time. Any reason to add yet another
encoding?

Sounds like a question to ask of the IETF.

How about encouraging widespread adoption of EXISTING standards
instead of adding more cruft? UTF-8 is standard. Proper DNS
implementations are eight-bit safe. People upgraded browsers
due to SSL, Year 2000, Javascript...

The DNS protocol is not 8-bit safe, much less any implementations of it. This is because ASCII upper case characters are down cased in comparisons. I.e., the following are equivalent label values in DNS: ABCDEF and abcdef and AbCdEf. Each has distinct binary encodings, but DNS comparisons treat them as equal.

Date: Fri, 3 Jan 2003 13:44:53 -0500
From: Edward Lewis

The DNS protocol is not 8-bit safe, much less any
implementations of it. This is because ASCII upper case
characters are down cased in comparisons. I.e., the

My point is there's no need to force chars <= 0x7f if DNS servers
are properly implemented. If they're not properly implemented,
why not, and whose fault is that? Catering to bad or broken
implementations instead of following standards is not a good way
to ensure interoperability.

DNS labels are encoded by a one-octet length representation
followed by that number of octets, with no restrictions on the
content of the octets. Show me where an RFC says something to
the extent of "labels and <any type of> RR MUST NOT contain
characters >= 0x7f" that rescinds 1035.

Yes, comparisons are case-insensitive. So what? strcasecmp()
works on ASCII strings. Now it must work on <new encoding x>.
Why not let <new encoding x> be UTF-8, something programmers
should support already? Maybe MS-style Unicode encoding? Why
add yet another encoding?!

I fear I may be straying OT, for this is layers 6/7...

Eddy

> Before IDNA, some application developers had developed
> proprietary mechanisms designed to support IDNs. The Internet

UTF-8 is a standard. MS products have used two-octet chars to
support Unicode for a long time. Any reason to add yet another
encoding?

UTF-8 is a character encoding standard, not a DNS-standard. DNS is not, and
has not ever been 8-bit clean, despite the fact that many, if not most,
implementations will survive UTF-8 labels.

IDN(A) is an effort to encode unicode into 7-bit DNS-labels, without
breaking backward compatibility (too hard). While there originally were a
few voices arguing for UTF-8 over the wire, they were few and the consensus
today is that IDN(A) is a Good Way to Go(tm).

How about encouraging widespread adoption of EXISTING standards
instead of adding more cruft? UTF-8 is standard. Proper DNS
implementations are eight-bit safe. People upgraded browsers
due to SSL, Year 2000, Javascript...

Or, how about encouringing widespread adoption of upcoming standards, such
as IDN?

http://www.ietf.org/html.charters/idn-charter.html

Remember, DNS implementations may be 8-bit safe, but that doesn't prevent
anything else from not being so. Domains are used in so much more than DNS,
you know. =)

Best regards,
Kandra Nygards

Yes, comparisons are case-insensitive. So what? strcasecmp()
works on ASCII strings. Now it must work on <new encoding x>.
Why not let <new encoding x> be UTF-8, something programmers
should support already? Maybe MS-style Unicode encoding? Why
add yet another encoding?!

Even the current MS encoding does not work. Check out 130.161.180.1, which I
think runs VMS. It does not even pass >127 characters to the root-servers.
It is the nameserver for a /16.

dig www.abc�.com A @130.161.180.1 <- www.abc\xfe.com

I fear I may be straying OT, for this is layers 6/7...

Hoping for all nameservers to magically break RFC compliance because you
think a 'properly coded nameserver' should behave is naive to say the least.

PowerDNS may well lowercase your query using functions not guaranteed to do
anything useful on >127 characters. Perhaps they are being helpful and
change capital-U-umlaut to lowercase-U-umlaut. Who knows.

Regards,

bert

In a message written on Fri, Jan 03, 2003 at 08:22:11PM +0100, Kandra Nygårds wrote:

IDN(A) is an effort to encode unicode into 7-bit DNS-labels, without
breaking backward compatibility (too hard). While there originally were a
few voices arguing for UTF-8 over the wire, they were few and the consensus
today is that IDN(A) is a Good Way to Go(tm).

The problem here is that the working groups for different services
are going different directions. E-mail base64 encodes Unicode in
MIME. Usenet seems to be moving to UTF-8 directly. DNS is using
IDN.

Woe be the ISP who must provide all these services to their customers,
and who's perl scripts must now be able to convert
base64<->UTF-8<->IDN<->whatever else is out there just to be able
to cobble together all the simple things we do everyday.

Most (all?) RFC type standards today specify US-ASCII and/or
ISO-8859-1 encoding. This is part of what has made the Internet
so popular. I understand the need to support more characters, but
let's do that by supporting some base encoding scheme and layering
everything on top of that, rather than creating hundreds new encoding
schemes, one for each higher level application.

Am I the only one that finds this perversion of the DNS protocol
abhorrent and scary? This is straight up hijacking.

It is quite disturbing, you would think that the folks responsible
for two of the biggest TLDs on the net would appreciate that not
everything is about people typing things into web browsers and that
their smart-assed scheme has a variety of possible nasty consequences.

It is presumably all about them being able to market a whole bunch
of internationalized variants of domain names to earn more $$$,
regardless of the technical consequences.

And the plugin for IE they are peddling... take a look at the license
agreement. http://www.idnnow.com/license.jsp No use for commercial
purposes? Automatic updates to allow them to take even more control
from users for their own commercial purposes at a later date? No
thanks.

It's scary but I'm not sure it's abhorrent.

The DNS is hit by a lot of bad traffic. E.g., a presentation at the previous nanog (http://www.nanog.org/mtg-0210/wessels.html) mentioned that just about 2% of traffic at the roots is "healthy" traffic. Over the years, there have been servers for 10.in-addr.arpa just to suck up queries that should have never leaked out the source networks.

It's encouraging that there is an effort to try to clean up the reasons for bad traffic. It's scary because in some sense the response is not true (I wouldn't call it hijacking), but when you are trying to cull out incompatible older editions of software, there's no safe route (no 'fail safe' method).

And yes, the approach mentioned is optimized for DNS resolution for web access. Hopefully this doesn't trap, for example, unwary SSH connections.

I find Microsoft blatantly sending out UTF-8 and 'another local encoding' to
nameservers interesting too.

The real question is why they don't move to the proposed 7-bit clean
mappings themselves. Microsoft are supposed to have quite warm relations
with Verisign, even after the certificate spat.

Wrt to the stunt that Verisign has employed today, well, they are in this
thing to make money, we all know that, and it isn't that bad. They capture
wrong queries and fix them up so they can sell more domains. Sure, it looks
suspicous and like something that should've been discussed more (I really
like announcements about something that will happen on January 3rd on
January 3rd). But downright evil?

Any query with a >127 character in it is bogus after all. Furthermore, it is
a query for '.COM' which they host anyhow. It's not like this is about
queries that would otherwise have not ended up at them. No new.net-style
tricks.

Evil would've been to just start selling UTF-8 domains and force flag day
upon the nameserver and mailserver world.

Reiterating, the real issue is that this needs a plugin. What happens in
that plugin is also very interesting. I suspect source isn't available,
who knows what is going on in there. Potentially, the i-Nav plugin hands
Verisign the keys of the internet, or at least the keys of Internet
Explorer, which is a slightly different thing.

Regards,

bert

And you find this unusual for Verisign/Network Solutions?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

UTF-8 is a standard. MS products have used two-octet chars to
support Unicode for a long time. Any reason to add yet another
encoding?

(Sorry, moderator, I have to use upper case here.)

PLEASE.

This (ie. IDN) has been discussed (and finally decided) in the IETF IDN wg
for AGES now. If you are so concerned, why did you not engage yourself
there? It is no secret what has been decided there.

As to technical merit, the others who responded have outlined pretty well
why UTF-8 is a Bad Idea For DNS.

That Verisign are taking this forward is, in the way they have chosen to
do, not really elegant, but I do understand their reasoning, and to some
extent appreciate that things are happening. Keep in mind that they are not
breaking standards, they are extending one application.

The other, earlier attempts to do things like this (especially NuNames)
have been way more rogue than this.

- --
M�ns Nilsson Systems Specialist
+46 70 681 7204 KTHNOC MN1334-RIPE

We're sysadmins. To us, data is a protocol-overhead.

In a message written on Fri, Jan 03, 2003 at 12:49:06PM -0500, Verd, Brad wrote:

response. The web servers refuse connections on all other UDP and TCP
ports, so other network services are minimally affected.

In a message written on Sat, Jan 04, 2003 at 11:04:08AM +0100, Måns Nilsson wrote:

That Verisign are taking this forward is, in the way they have chosen to
do, not really elegant, but I do understand their reasoning, and to some
extent appreciate that things are happening. Keep in mind that they are not
breaking standards, they are extending one application.

The first bit from the original announcement caught my attention.
The ongoing defense of this as not "breaking" things makes me want
to point out something that I think could occur:

A mail server in .COM or .NET gets an e-mail, say korean spam, that
has an 8 bit high character in one or more addresses. The mail
server, while not 8 bit clean, is 8 bit clean enough to pass this
on to standard DNS routines. They get back no MX, but an A record,
pointing to this farm. Most mail servers will go ahead and try
the A record, getting connection refused. The mailer will keep
retrying for several days, all the while these backing up in the
queue.

That's just mail. I can see a half dozen other situations where
something might get one of these names and have to timeout, probably
at best making a user wait longer to get an error message, at worst
backing up all sorts of services if they are accidently given one
of these "special" names.

Was this problem discussed in the working group?