[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IDNA problem statement



on 10/15/2002 7:39 AM Erik Nordmark wrote:

> 1.1 Problem Statement
>
> The IDNA specification solves the problem of extending the repertoire
> of characters that can be used in domain names to include the Unicode
> repertoire.

This is broader than the actual solution. IDNA allows applications to make
use of characters from the Unicode repertoire by providing a standardized
encoding of i18n domain name which can be used to represent and interpret
extended characters consistently. It does not extend the repertoire of
characters which are usable in domain names, nor does it provide
applications with access to these characters in their unencoded form.

> IDNA does not extend the service offered by DNS to the applications.

Correct

> Instead, the applications (and, by implication, the users) continue to
> see an exact-match lookup service. Either there is a single
> exactly-matching name or there is no match. This model has served the
> existing applications well, but it requires that users know the exact
> spelling of the domain names that the users type into applications such
> as web browsers and mail user agents.

This is irrelevant. It's interesting to about five people here, but it a
complete side-track to the functionality offered by IDNA. Suggest you drop
this entirely.

> The introduction of the larger
> repertoire of characters potentially makes the set of misspellings
> larger, especially given that in some cases the same appearance, for
> example on a business card, might visually match several Unicode code
> points or several sequences of code points.

Move that to the security section, or to the 1.2 section below

> IDNA allows the graceful introduction of IDNs not only by avoiding
> upgrades to existing DNS infrastructure (servers, caches, stub
> resolvers),

Put it another way. IDNA doesn't "break" ANY node in the infrastructure,
including DNS servers/resolvers/blah, but ALSO including protocols and
their applications. You are limiting the discussion to certain elements of
the infrastructure and casually omitting the other 99% of the
infrastructure, implying that the 99% doesn't matter. You are talking
about the 1% getting a free-ride like it is a big deal.

Look, I have never understood the fascination with the 1% that doesn't
have to pay. This continued focus on "we saved the 1%" is intellectually
dishonest all the way around. Not only is it a miniscule portion of the
infrastructure that actually USES domain names, but it completely ignores
the fact that 99% DOES HAVE TO PAY.

All I'm asking here is that you provide the full analysis on this point,
rather than telling the 1% story again. If we are so embarrassed by the
truth, then we should rethink it. If we are comfortable with the truth,
then let's tell it.

> but also by allowing some rudimentary use of IDNs in
> applications by using the ASCII representation of the non-ASCII name
> labels.

Yeah, that's it

> While such names are very user-unfriendly to read and type, and
> hence are not suitable for user input, they allow (for instance)
> replying to email and clicking on URLs even though the domain name
> displayed is incomprehensible to the user. In order to allow
> user-friendly input and output of the IDNs, the applications need to be
> modified to conform to this specification.

^every application that wants to display them

> IDNA uses the Unicode character repertoire which avoids the significant
> delays that that would be inherent in waiting for a different and
> specific character set be defined for IDN purposes by some other
> standards developing organization.

No justification is required here, IMO.

> 1.2 Limitations of IDNA
>
> <EXISTING section 6.6 moved to here> The IDNA protocol does not solve
> all linguistic issues with users inputting names in different scripts.
> Many important language-based and script-based mappings are not covered
> in IDNA and need to be handled outside the protocol. For example, names
> that are entered in a mix of traditional and simplified Chinese
> characters will not be mapped to a single canonical name. Another
> example is Scandinavian names that are entered with U+00F6 (LATIN SMALL
> LETTER O WITH DIAERESIS) will not be mapped to U+00F8 (LATIN SMALL
> LETTER O WITH STROKE).
>
> <ADDED> An example of an important issue that is not considered in
> detail in IDNA is how to provide a high probability that a user who is
> entering a domain name based on visual information (such as from a
> business card or billboard) or aural information (such as from a
> telephone or radio) would correctly enter the IDN. This a complex issue
> relating to languages, input methods on computers, and so on.
> Furthermore, the kind of matching and searching necessary for a high
> probability of success would not fit the role of the DNS and its exact
> matching function.

there are other hot topics which might be well-served here, specifically
including the issues with local functions (clipboards, piping, etc),
dangers from auto-conversion (message-id, searching), etc.

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/