[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] WG last call summary




"Adam M. Costello" wrote:

> Are you saying that ToASCII is good, and ToUnicode is bad?

I am saying that legacy applications need access to IDN namespaces, but
that modifying well-known and widely-used data-types in order to render
domain names in Unicode form is foolish. We have to separate domain names
from the data-types that also use them; they do not need to be cojoined.

This means that legacy applications, protocols and data-types which use
STD13 names must only be presented with the STD13 form of the IDNs. The
i18n form of those names must only be presented to the applications,
protocols and data-types which can make use of an i18n domain name.

Incorporating this distinction into the current concepts isn't all that
easy because of the cross-breeding of ideas and objectives in the docs.

What I would like to see is for the current IDNA spec to be made into a
codec definition with guidance on implementation. This means deleting (or
re-scoping) section "3. Requirements" and deleting section "6.4 Avoiding
exposing users to the raw ACE encoding", and adding a new section for
"Implementation considerations".

The new text should essentially state that domain names which are used by
applications, protocol messages and data-formats MUST be passed and
displayed in LDH form, except where the governing specification has
explicitly defined an IDN behavior for the affected domain name, and that
the use of ToUnicode is expressly prohibited if the governing
specification has not defined how and where that function will be
deployed. It should also be stated that the "governing specification" will
often be the local software specifications, such as man named.conf, man
ping, or whatever (these will govern domain names which are used as
connection identifiers, and which are not used for protocol messages or
standardized data-types).

> I can imagine a world with ToASCII but without ToUnicode.  If a
> non-ASCII name came to you via new protocols that support non-ASCII
> names directly, then you'd see the human-friendly form.  But if the name
> traversed an old protocol (at any point), you'd see the ACE.

That would be true for new protocol messages and/or data-types that had to
traverse the old namespace, yes. There would also be an assumption that
the new messages and/or data-types provided mechanisms for storing the
IDNs in some kind of raw form (eg UTF-8), and that a conversion point was
defined which said "do the conversion here".

> I don't see how doing away with ToUnicode would solve any problem.  The
> main danger is non-ASCII names getting accidentally fed to old software.
> Even without ToUnicode, non-ASCII names would still be out there (being
> carried around by the new protocols and new applications), and it would
> still be possible for them to get pasted or piped into old programs.

If the sanctity of the existing data-types are preserved, that won't break
anything (or as much, anyway), since only the new data-types will be able
to use native IDNs.

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/