[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] WG last call summary



"Eric A. Hall" <ehall@ehsco.com> wrote:

> The ToUnicode step changes well-known and widely-used data-types in
> such a way that the data no longer conforms to the rules which govern
> that data.

ToUnicode does not change any data type.  ToUnicode does not apply to
email address, or URIs, or message-IDs.  It applies to IDNs (including
traditional ASCII domain names).  In order to get any interaction
between a message-ID and ToUnicode, you first need to extract the
domain name from the message-ID.  Then when you apply ToUnicode, you
get an IDN, not an internationalized-message-ID.  If you want to have
a message-ID again, you need to insert the IDN back into a message-ID
structure, which invokes rule 1 and requires the IDN to be forced back
to ASCII (via ToASCII).

That's the story when machines are talking to machines.  When it's
machines talking to humans, it's a different story:

When a mail program displays a message header, it's not necessarily a
real RFC 822 message header on the screen.  The mail program doesn't
promise to speak RFC 822 to the human user.  What's on the screen is a
rendering.  The program might omit certain fields that are required, and
might include non-ASCII characters (converted from RFC 2047 encoding).
It might let me type non-ASCII characters into the Subject line of an
outgoing message.  When I instruct the program to send a message, the
program will take care to send an actual RFC 822 message, doing whatever
encoding is necessary.  I might try to cut & paste the message from the
screen into sendmail, but I wouldn't be very surprised if that didn't
work.  The program might let me type non-ASCII characters into a search
query, and it will find messages with matching headers or matching
bodies, even if they use quoted-printable and base64 encoding.  (Mutt
does all this correctly, by the way.)

> An example I have already given is Message-ID.  Basic functionality
> will be broken if the structure of the well-known and widely-used
> Message-ID data-type as defined in STD11 is extended beyond the scope
> of the governing spec.  The extended, STD11-incompatible form will
> break on search inputs that don't allow those values, it will break
> if the search input accepts it and passes it to a remote system
> via an IMAP SEARCH or NNTP XPAT operation, it will break if a user
> puts it into a news or http URL which gets transliterated and then
> percent-hacked, and it will break if a user types it into a web URL as
> a parameter to a server-based search function.  That's just search and
> fetch, nevermind other problems like damage to threading that results
> from an extended Message-ID which is manually added to See-Also or
> References, corrupted spam complaints, and the dozens of other common
> uses for this well-known, widely-used, STANDARDIZED data-type.

IDNA does not extend the message-ID type.  Message-IDs continue to be
ASCII-only, just as they always have been.  IDNA-aware applications,
as a user-interface courtesy, will usually render them using non-ASCII
characters (when they contain ACE labels), and allow users to enter
them using non-ASCII characters.  But an IDNA-aware program knows that
message-IDs are really ASCII-only, and will apply ToASCII as needed.

Everything you and I have said about message-IDs applies equally to
email addresses (which is not surprising, since message-IDs and email
addresses are syntactically the same).  Email addresses continue to be
ASCII-only, but for email addresses containing ACE labels, it will be
helpful for user interfaces to display/accept non-ASCII characters.  So
any internationalized program that deals with mail and human users is
going to have to deal with the conversion.  If it can cope with email
addresses, then it can cope with message-IDs in exactly the same way.

> Implicitly extending all such data-types currently in use on the
> Internet (as the current draft does)

We have found the crux of our disagreement.  I maintain that IDNA does
*not* extend or alter any data types.  It defines one new data type
(IDN), and defines a correspondence between IDNs and ASCII domain names,
to be used for user interfaces and for gatewaying between old & new
protocols/interfaces.  But it does not extend any existing types.  Any
existing type that contains ASCII domain names continues to contain
ASCII domain names.

AMC