[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IDNA interoperability failures, once again



[This message replies to both D. J. Bernstein and Eric A. Hall regarding
similar issues.]

"D. J. Bernstein" <djb@cr.yp.to> wrote:

> > When an application is displaying header fields to the user,
> 
> Define your terminology.  How does a UNIX program figure out whether it
> is displaying something ``to the user''?

(To anyone who doesn't use UNIX:  We're talking about a program that
outputs text to standard output.  Standard output usually gets displayed
on a terminal emulator unless the user has redirected it to a file or
to the input of another program.  Some programs are intended to be
redirected and have human-unfriendly machine-friendly output.  Some are
intended to be viewed and have human-friendly machine-unfriendly output.
Some are somewhere in between, which motivates the question here.)

> Yes or no:  Should ``dig'' convert its results from your 7-bit
> encoding to the local character set? (Assume the LANG=en_US.UTF-8
> locale, so that the conversion is at least theoretically doable.)

Yes.  The second sentence of the man page for dig is "It performs DNS
lookups and displays the answers..."  Furthermore, the man page does
not specify the output format of dig, so the output does not contain
any domain name slots.  Clearly it's primary purpose is to display
information to humans, and any program that attempts to parse the output
of dig is assuming the risk that someday this undocumented output format
might change.

Any user clever enough to use dig and to use programs that
parse its output is probably also clever enough to prepend "env
LANG=en_US.US-ASCII" to the command line if necessary, or to use the
--no-idna option of dig that would almost surely have been added at the
same time IDNA support was added.

Another option for any application would be to have IDNA support
completely disabled by default, and enabled only at the user's request.

"Eric A. Hall" <ehall@ehsco.com> wrote:

> > When an application is displaying header fields to the user, it should
> > apply ToUnicode to domain labels found in domain name slots.  RFC 822
> > defines the syntax of every header field, and some of those fields have
> > <domain> in their definition.  Those <domain>s are the domain name
> > slots.  Anything else in the header is not a domain name slot.
> 
> Are you SURE you want to do this with Received?

Yes.

> or rather, are you SURE you want to mandate this behavior
> with Received on behalf of the other communities, even though
> copy-and-paste with transliterated Received headers will get ugly.

Novice users never even see Received headers.  Users who know enough to
request to see Received headers will also know enough to request ASCII
if they need it.

As for "mandate", keep in mind this sentence from the IDNA introduction:

    This document does not require any applications to conform to IDNA,
    but applications can elect to use IDNA in order to support IDN while
    maintaining interoperability with existing infrastructure.

If IDNA is more trouble than it's worth for particular applications,
those applications can just ignore it.

> Does your list include all of those data-types which have domain names
> as a subordinate element?

Yes.

> I will assume that your list also includes "slots" that hold email
> addresses

The list you requested was a list of header fields.  The list includes
header fields that contain <mailbox>, because <mailbox> ultimately
contains a <domain>.  The <domain> is the domain name slot.

> Are URLs in header fields therefore also candidates?

If a header field is defined to contain a URL, and the URL scheme
indicates that it contains a domain name, then that is a domain name
slot.

> Are you SURE you want to do this with List-* and Content-Location
> header fields?

Content-Location is an HTTP header field, and HTTP responses are not
normally displayed, but if an application wanted to display an HTTP
response, then it should display the non-ASCII form of the domain name
if it can (unless the user requests otherwise or special circumstances
make this problematic).

I don't know what List-* fields are, but if they are defined to contain
domain names, then it's the same story as Content-Location.

> Are message identifiers also candidates, since they are structured
> the same as email addresses?  Are you SURE you want to do this with
> Message-ID?

Same story as Received.

AMC