COM/NET informational message

I'm sorry, but this is incorrect in many different dimensions. The
subject was discussed exhaustively in the IETF's IDN working group; I
refer you to its archive for detailed discussions. Among many other
things, your assertion about the simplicity of name comparisons is
wrong; see draft-hoffman-stringprep-07.txt for a discussion of that
issue. As for 8-bit clean DNS -- well, apart from the many possible
ways to encode things, there's the issue of the many applications that
aren't 8-bit clean, including (per the RFC 822 spec) SMTP. If "just
use 8-bit clean DNS" were sufficient, we'd have been there several
years ago. See http://www.ietf.org/html.charters/idn-charter.html
for many more pointers.

    --Steve Bellovin, http://www.research.att.com/~smb (me)
    http://www.wilyhacker.com (2nd edition of "Firewalls" book)

Date: Fri, 03 Jan 2003 14:41:45 -0500
From: Steven M. Bellovin

I'm sorry, but this is incorrect in many different dimensions. The
subject was discussed exhaustively in the IETF's IDN working group; I
refer you to its archive for detailed discussions. Among many other
things, your assertion about the simplicity of name comparisons is
wrong; see draft-hoffman-stringprep-07.txt for a discussion of that

Will check these.

issue. As for 8-bit clean DNS -- well, apart from the many possible
ways to encode things, there's the issue of the many applications that
aren't 8-bit clean, including (per the RFC 822 spec) SMTP. If "just

Good point.

use 8-bit clean DNS" were sufficient, we'd have been there several
years ago. See Internationalized Domain Name (idn)
for many more pointers.

So, if I understand correctly:

* RFC 2045/2047 for MIME-aware apps (SMTP, etc.)
* UTF-8 for SNMP
* IDNA for DNS and the FQHN part of a HTTP request?

Yuck.

Oh well. I guess this is just another encoding to implement. It
would be a shame to try using the same encoding for everything...

I'll check out the IDNA spec. Hopefully it at least encodes
extended characters in a manner that strcasecmp() works without
modification, i.e., upper- and lowercase chars are mapped to 'A'
through 'Z' and 'a'-'z'...

Eddy