[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] The purpose of IDN
At 12:28 AM 9/1/2002 -0400, John C Klensin wrote:
I believe that IDNA (and the supporting documents) are... a reasonably
well-understood solution to _some_ problem. I'm not sure I know what that
problem is, who cares about it, and whether it is important enough to
justify changes to the way the DNS works and is interpreted.
We should all be *very* worried that this major IETF effort has gone on for
nearly 3 years, yet such a question can be seriously raised. (Or rather,
that such a question legitimately reflects a concern among serious
participants, as it clearly does.)
This suggests strongly that the charter and the working group process
adequately characterized *neither* the problem to solve nor the benefactors
of the solution.
Here is my own effort at doing both:
IDN Scope and Goals
1. The set of characters available for use in domain names is problematic
for large portions of the global Internet user population. Many users do
not use Latin characters *ever*. Current technology standards permit them
to use their local set of non-Latin characters for all their Internet
activities, EXCEPT domain names! Hence, the set of characters that can be
used for domain names needs to be increased.
2. The most immediate need is for support of this increased set of
characters in email and web domain names.
3. Domain names are used both by humans and by computers. The human uses
occur within free text and even non-computer contexts, such as business
cards and advertising. The human benefit of domain names, over such
alternatives as numeric strings, is mnemonic salience. The darn things are
simply easier to remember, though not necessarily easy to guess.
4. A long history of making changes to Internet infrastructure highly
recommends finding a way to do this enhancement so that it a) does not
disturb the installed base, b) interworks with that installed base, and c)
permits incremental benefit when there is incremental adoption.
Hence, the immediate functional description of IDN is that users will be
able to register and use non-Latin characters in domain names that are part
of email and web addresses. Further, the method of transmitting those
characters, in the DNS protocol environment, needs to use a layered
encoding scheme, along the lines of content-transfer-encoding for binary
data in MIME.
I know that the IDNA approach, with adequate definition of the characters
to be used, will permit internationalization of low-level identifiers.
That is excellent, because that is exactly what it is supposed to
do. After all that is what domain names are, albeit identifiers with some
I really hate to ask this, but I am not even sure what you mean "character
If that is all
we care about, then there is a case to be made that it is
actually more mechanism than is needed: if the same constrained
processes are to be used to access a name that cause it to be
created, than the subtle issues of character matching for
different codepoints may not be relevant.
different codepoints", unless this is the Unicode equivalent to ASCII
If my guess happens to be correct, I admit to continued ignorance about the
reasonableness of having strings be case insensitive for non-Latin
characters. And this is one of those issues that, frankly, simply requires
a decision. While case insensitivity can be a Very Good Thing, we have
plenty of examples of its absence being acceptable, even when it is quite
By contrast, it is clear that the WG has not solved (Dave would, I
think, say that
You are prescient. Cut and paste is a *user interface* issue that already
exists, far beyond domain names. And the world already has mechanisms for
dealing with it.
it has no scope or charter to even examine) the set of questions
associated with accurate transcription of DNS names from other
environments and media.
However well or poorly those mechanisms work, it is not within IETF scope
to try to alter them.
It is equally clear that many people are focused on that problem and
won't consider any "DNS internationalization" problem to be solved unless
it has some adequate resolution.
This is a good example of the reason "DNS Internationalization" is not a
useful term. It is also a good example of the reason the target usage
scenarios need to be extremely explicit, as I have tried to make them, above.
The IETF has a long history of participants wanting to pursue topics beyond
what is practical. The solution is to NOT pursue those topics.
At 08:53 AM 9/1/2002 -0400, vinton g. cerf wrote:
One working definition of internationalization is that the
encoding/expression is "understood" by speakers of all languages.
This highlights why "localization" is probably a much more useful term.
While retaining global interoperability, this domain name enhancement needs
to permit use of characters that are tailored to smaller communities --
where one such "smaller" community is more than a billion people...
Consequently, someone sending a letter from the US to a recipient in
Vietnam can write the destination address in Vietnamese and the US postal
service need only understand the characters "VIETNAM" at the bottom of the
IDNA accomplishes this combination of global "interpretation routing". In
fact, it is inherent in domain names, and IDNA strings are valid domain
names within the current DNS.
So what does IDNA do that might be viewed as a problem? The answer is that
an *encoded* IDNA string has no mnemonic value for anyone. It looks like a
IDNA strings that are in Unicode are as mnemonic as ASCII strings, for
those users who support the relevant Unicode set of characters. For those
who do not, they will not see those Unicode characters. They will see the
"random" string, which is a valid domain name, but lacks mnemonic benefit.
multilingual domain names may not necessarily contribute to universal
ability to use the resulting strings because it may be difficult to
impossible to render or enter arbitrary character sets at the user
interface to a local service.
Has MIME's ability to support non-ASCII characters been helpful to the
overall utility of the Internet? I claim it has, even though I cannot read
any of those other characters. (I am, after all, a typical American....)
The real issue, here, is whether the Internet infrastructure properly
labels and carries information that can be understood by some users, but
not others. Frankly, I see the question of non-Latin characters as being
the same as using obscure vocabulary. It is fine to use that vocabulary,
as long as it works with the intended recipients.
Presumably, no one has a problem with a domain name like
tak-apa.com. However, only speakers of Bahasa are likely to know that it
means "no problem".
We need to make exactly the same distinction between semantic/mnemonic
utility, versus mechanical utility. IDNA permits a much broader -- and
more localized -- range of semantic/mnemonic domain names, while retaining
all of the necessary global, mechanical interoperability.
We have collectively probably created some confusion for ourselves by
using the term "internationalized domain names" to cover both concepts.
You are exactly correct. The problematic nature of the term
"internationalization" has been discussed before. I would have wished for
different terminology, too.
It strikes me that the IDNA documents are more aimed at
localization/multilingualization than internationalization, using the
"definition" in the first paragraph above.
Concerns about how cut/paste will work are germane to the discussion about
the utility of IDNs because such actions may be the ONLY way in which
someone may be able to enter special character strings into text intended
to represent an IDN.
The technical issues about multi-data-type cut-and-paste are beyond the
competence of this community.
All we know -- and all we need to know -- is that modern user interfaces
are quite good at supporting cut and paste of pictures, voice, labeled
strings, and lots more. There is no reason to believe that an IDN is even
slightly difficult for such an environment.
I usually end up cutting and pasting the characters. This works because
the text of email is permitted to be pretty general in its encoding. I
don't know how that would work out if I were dealing with non-Latin
And you do not know that it will NOT work.
We DO know that it needs to work, and we DO know that it is a matter
entirely within the purview of user interface designers, not protocol geeks.
One of the important questions that I sense is being asked in the
discussion of IDNA is just how applications that encounter these encoded
objects/strings should handle them,
Is there some reason that the IETF should pursue this matter any more
deeply than it has done for MIME?
Dave Crocker <mailto:firstname.lastname@example.org>
TribalWise, Inc. <http://www.tribalwise.com>
tel +1.408.246.8253; fax +1.408.850.1850