[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IDNA: is the specification proper, adequate, and complete?



Doh, I got bitten by my own Reply-To header, and accidentally advised
John to send his reply to the mailing list (which he did), even though
my message (which he was replying to) did not go to the list.  Paul is
surely laughing at me now.  :)

Sorry for the extra list traffic.  Anyone who wants to move this thread
back off the list should override the Reply-To field.

[For anyone who's curious, I composed a single reply to two messages,
one from the mailing list, and another (a response to that one) from
off the list.  Since the mailing list address initially appeared in the
recipient list, the Reply-To field was auto-generated.  When I manually
removed the mailing list from the recipient list, I forgot to remove the
Reply-To.]

I wrote:

> So applications can indeed decide not to use IDNA, in which case they
> can't use non-ASCII characters in domain names.  If you want to use
> non-ASCII characters in domain names, IDNA is your only option.

To clarify for the list readers:  Applications have always been able
to use 8-bit names, and still can, and IDNA will not change that.
Applications might have their own private interpretation of bytes
80..FF as characters, but there is no standard interpretation of those
bytes as characters; the only characters that can currently be used
unambiguously in domain names are the ASCII characters.  If IDNA (as
currently written) is adopted, it will be the standard definition
of internationalized domain names.  The only way to use non-ASCII
characters in domain names in a standard unambiguous way will be to
observe the rules of IDNA (which do allow for non-ASCII representation
in new protocols).

I wrote:

> We can't stop applications from using custom mappings from non-ASCII
> text onto domain names, but the rest of the world would never see the
> non-ASCII characters.  The rest of the world would simply see RFC 1035
> domain names, which contain only ASCII characters (and possibly octets
> 80..FF, which are not defined to represent any characters at all).

John C Klensin <klensin@jck.com> responded:

> But we can say, clearly, that any such applications or activities are
> non-conforming to IETF standards.  And I believe we should do so.

I don't think they are non-conforming.  Consider this analogous
situation:  For some reason I want to map geographic locations
onto domain labels.  There is no standard for using domain labels
to represent geographic locations, but I make one up, for example
n037d37m05s-w122d23m14s.  No one else will understand that this is
a geographic location, to them it's just a domain label, but it's a
conformant domain label, and it doesn't hurt anyone.

Now suppose my data type is something that contains non-ASCII
characters, but it would get hopeless damaged by Nameprep, so I cannot
use IDNA.  Just as in the previous case, I can make up my own mapping
and use it without bothering anyone else, they just won't understand it.

I wrote:

> It's still true that IDNA is optional; if you don't want to support
> it, then don't, but then you have no way to enter non-ASCII names into
> zones.

As before, I meant "no standard unambiguous way".

I wrote:

>     a primary master name server MUST NOT contain an ACE-encoded label
>     that decodes to an ASCII label.
>
> This is a vacuous requirement.  There is no such thing as an ACE label
> that decodes to an ASCII label, because of the design of the ToASCII
> operation.  Therefore, this is not really requiring anything.  I think
> I once asked for it's removal, but since it's harmless, I didn't press
> it.

John responded:

> Recommendation: Leave it there, since it is harmless and may be
> helpful to someone.  But rephrase it, not as a "MUST" (or any other
> form of requirement), but as an observation.  E.g., "note that,
> since the design of the ToASCII operation prevents any ACE label
> from decoding to an ASCII label (i.e., one without any non-ASCII
> characters), a primary master name server will never...".  Or
> something like that.

I like that.  Now that I think about it, I should have said "because of
the design of ToASCII and ToUnicode", not just ToASCII.

> But I'd still prefer to see either enumerations of where IDNA can be
> used, or some much more clear language about its applicability.

First sentence of section 1.1:

    Applications can use IDNA to support internationalized domain names
    anywhere that ASCII domain names are already supported

> I believe that we should confine IDNA application and interpretation
> to what we can reasonable claim we know about today and that we should
> _explicitly_ leave future use and applicability for decisions to be
> made in the future.

I suppose future specs that define new places for domain names to appear
could explicitly say "IDNA shall not apply to domain names appearing in
these places".  In the absence of such a statement, the first sentence
of section 1.1 remains as a simple fact.

> Now, suppose someone comes along and invents a new RR type. ...the
> characters that should be permitted in labels for that RR type
> should be... a few hundreds or, at most, a few thousand, of select
> characters.
>
> Now, to accomplish that goal... the best path, given our current
> framework, would be to modify stringprep to add a table of the
> characters permitted by that RR.  _Then_ the right thing to do is to
> create a profile that uses that table instead of some of the maps and
> prohibitions of nameprep.

The specification does not require any particular implementation, it
just defines the right answer.  IDNA requires applications to convert
non-ASCII labels to equivalent ASCII labels in certain situations, and
"equivalent" is defined in terms of ToASCII, which is defined in terms
of Nameprep.  Applications are not prevented from imposing additional
restrictions on labels.  IDNA says that anything that doesn't survive
ToASCII is not allowed, but it doesn't say that anything that does
survive ToASCII must be allowed.

The application is under no obligation to implement ToASCII exactly
as it is shown in the spec, as long as it computes the right answer.
If instead of performing Nameprep and then checking some additional
restrictions, the application can use another Stringprep profile that
always returns the same result as Nameprep-plus-additional-restrictions,
it is welcome to do that.  That would just be an optimization.

AMC