[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] The purpose of IDN



At 12:28 AM 9/1/2002 -0400, John C Klensin wrote:
I believe that IDNA (and the supporting documents) are... a reasonably well-understood solution to _some_ problem. I'm not sure I know what that problem is, who cares about it, and whether it is important enough to justify changes to the way the DNS works and is interpreted.
We should all be *very* worried that this major IETF effort has gone on for nearly 3 years, yet such a question can be seriously raised. (Or rather, that such a question legitimately reflects a concern among serious participants, as it clearly does.)

This suggests strongly that the charter and the working group process adequately characterized *neither* the problem to solve nor the benefactors of the solution.

Here is my own effort at doing both:


IDN Scope and Goals
-------------------

1. The set of characters available for use in domain names is problematic for large portions of the global Internet user population. Many users do not use Latin characters *ever*. Current technology standards permit them to use their local set of non-Latin characters for all their Internet activities, EXCEPT domain names! Hence, the set of characters that can be used for domain names needs to be increased.

2. The most immediate need is for support of this increased set of characters in email and web domain names.

3. Domain names are used both by humans and by computers. The human uses occur within free text and even non-computer contexts, such as business cards and advertising. The human benefit of domain names, over such alternatives as numeric strings, is mnemonic salience. The darn things are simply easier to remember, though not necessarily easy to guess.

4. A long history of making changes to Internet infrastructure highly recommends finding a way to do this enhancement so that it a) does not disturb the installed base, b) interworks with that installed base, and c) permits incremental benefit when there is incremental adoption.

Hence, the immediate functional description of IDN is that users will be able to register and use non-Latin characters in domain names that are part of email and web addresses. Further, the method of transmitting those characters, in the DNS protocol environment, needs to use a layered encoding scheme, along the lines of content-transfer-encoding for binary data in MIME.


I know that the IDNA approach, with adequate definition of the characters to be used, will permit internationalization of low-level identifiers.
That is excellent, because that is exactly what it is supposed to do. After all that is what domain names are, albeit identifiers with some mnemonic qualities.


 If that is all
we care about, then there is a case to be made that it is
actually more mechanism than is needed: if the same constrained
processes are to be used to access a name that cause it to be
created, than the subtle issues of character matching for
different codepoints may not be relevant.
I really hate to ask this, but I am not even sure what you mean "character matching for
different codepoints", unless this is the Unicode equivalent to ASCII case-INsensitivity.

If my guess happens to be correct, I admit to continued ignorance about the reasonableness of having strings be case insensitive for non-Latin characters. And this is one of those issues that, frankly, simply requires a decision. While case insensitivity can be a Very Good Thing, we have plenty of examples of its absence being acceptable, even when it is quite inconvenient.


By contrast, it is clear that the WG has not solved (Dave would, I think, say that
it has no scope or charter to even examine) the set of questions associated with accurate transcription of DNS names from other environments and media.
You are prescient. Cut and paste is a *user interface* issue that already exists, far beyond domain names. And the world already has mechanisms for dealing with it.

However well or poorly those mechanisms work, it is not within IETF scope to try to alter them.


It is equally clear that many people are focused on that problem and won't consider any "DNS internationalization" problem to be solved unless it has some adequate resolution.
This is a good example of the reason "DNS Internationalization" is not a useful term. It is also a good example of the reason the target usage scenarios need to be extremely explicit, as I have tried to make them, above.

The IETF has a long history of participants wanting to pursue topics beyond what is practical. The solution is to NOT pursue those topics.



At 08:53 AM 9/1/2002 -0400, vinton g. cerf wrote:
One working definition of internationalization is that the encoding/expression is "understood" by speakers of all languages.
This highlights why "localization" is probably a much more useful term.

While retaining global interoperability, this domain name enhancement needs to permit use of characters that are tailored to smaller communities -- where one such "smaller" community is more than a billion people...


Consequently, someone sending a letter from the US to a recipient in Vietnam can write the destination address in Vietnamese and the US postal service need only understand the characters "VIETNAM" at the bottom of the destination address.
IDNA accomplishes this combination of global "interpretation routing". In fact, it is inherent in domain names, and IDNA strings are valid domain names within the current DNS.

So what does IDNA do that might be viewed as a problem? The answer is that an *encoded* IDNA string has no mnemonic value for anyone. It looks like a random string.

IDNA strings that are in Unicode are as mnemonic as ASCII strings, for those users who support the relevant Unicode set of characters. For those who do not, they will not see those Unicode characters. They will see the "random" string, which is a valid domain name, but lacks mnemonic benefit.


multilingual domain names may not necessarily contribute to universal ability to use the resulting strings because it may be difficult to impossible to render or enter arbitrary character sets at the user interface to a local service.
Has MIME's ability to support non-ASCII characters been helpful to the overall utility of the Internet? I claim it has, even though I cannot read any of those other characters. (I am, after all, a typical American....)

The real issue, here, is whether the Internet infrastructure properly labels and carries information that can be understood by some users, but not others. Frankly, I see the question of non-Latin characters as being the same as using obscure vocabulary. It is fine to use that vocabulary, as long as it works with the intended recipients.

Presumably, no one has a problem with a domain name like tak-apa.com. However, only speakers of Bahasa are likely to know that it means "no problem".

We need to make exactly the same distinction between semantic/mnemonic utility, versus mechanical utility. IDNA permits a much broader -- and more localized -- range of semantic/mnemonic domain names, while retaining all of the necessary global, mechanical interoperability.


We have collectively probably created some confusion for ourselves by using the term "internationalized domain names" to cover both concepts.
It strikes me that the IDNA documents are more aimed at localization/multilingualization than internationalization, using the "definition" in the first paragraph above.
You are exactly correct. The problematic nature of the term "internationalization" has been discussed before. I would have wished for different terminology, too.


Concerns about how cut/paste will work are germane to the discussion about the utility of IDNs because such actions may be the ONLY way in which someone may be able to enter special character strings into text intended to represent an IDN.
The technical issues about multi-data-type cut-and-paste are beyond the competence of this community.

All we know -- and all we need to know -- is that modern user interfaces are quite good at supporting cut and paste of pictures, voice, labeled strings, and lots more. There is no reason to believe that an IDN is even slightly difficult for such an environment.


I usually end up cutting and pasting the characters. This works because the text of email is permitted to be pretty general in its encoding. I don't know how that would work out if I were dealing with non-Latin character sets.
And you do not know that it will NOT work.

We DO know that it needs to work, and we DO know that it is a matter entirely within the purview of user interface designers, not protocol geeks.


One of the important questions that I sense is being asked in the discussion of IDNA is just how applications that encounter these encoded objects/strings should handle them,
Is there some reason that the IETF should pursue this matter any more deeply than it has done for MIME?

d/


----------
Dave Crocker <mailto:dave@tribalwise.com>
TribalWise, Inc. <http://www.tribalwise.com>
tel +1.408.246.8253; fax +1.408.850.1850