[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt

To: "DualName - ShimSungJae" <shimsungjae@dualname.com>,<idn@ops.ietf.org>,"Paul Hoffman / IMC" <phoffman@imc.org>
Subject: Re: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt
From: "Mark Davis" <mark@macchiato.com>
Date: Sun, 19 Nov 2000 18:14:22 -0800
Delivery-date: Sun, 19 Nov 2000 18:17:30 -0800
Envelope-to: idn-data@psg.com
However it is done, what you are talking about is mapping the letters of
each and every language on Earth into [a-z,0-9,and hyphen]. Korean Hangul
has the marked advantage of being relatively rather simple to map to and
from Latin letters without ambiguity. There are a huge number of problems
with this approach in general:

a. It is unacceptable for the worlds' cultures to not be able to represent
the native characters for their languages. But this requires an unambiguous
round-trip mapping.

b. There are no transliteration standards for many, many languages.

c. Where standards exist, there are often many conflicting choices.

d. Where standards for transliteration exist, such as the ISO standards,
they are often defective -- they do not provide for lossless round-tripping
back from the romanized text.

e. Where standards exist, they often map from original characters to
accented roman characters. There are generally no conventions for
representing those characters strictly with [a-z, 0-9, and hyphen].

f. If the standards are phonetic, it is often impossible to round-trip back
to the original characters. Consider Japanese, for example. There is no
accepted way to convert *unambiguously* from roman letters back to a mixture
of Kanji, Hiragana, and Katakana. Companies spend hundreds of millions of
dollars to try to get the best input methods, which are doing essentially
this -- yet all of them require human intervention.

g. If you are talking about a phonemic representation, that is typically
done with IPA. There are hundreds of characters and accents in IPA. There is
no mechanism for straightforward, unambiguous representation of those
characters with the limited set [a-z, 0-9, and hyphen]. Even if there were,
it would probably not be particularly readable.

h. This would also require all languages written currently with accented
characters (French, German, Swedish, Slovak, Polish, Lithuanian, etc.) to
have conventions for expressing those accents strictly with [a-z, 0-9, and
hyphen]. Otherwise information is lost. These standards don't exist, and
again, probably would not be particularly readable, except for a few cases
(such as German, where there is an established tradition of using "e" for
representing umlauts).

i. Even if this were all possible in any reasonable amount of time, it would
be quite expensive, and extremely difficult to administer and validate. As
you say "The knowledge base includes not only the general principles of
transliteration but also common usages, idiomatic expressions, and possible
variations that may occur in transliteration." So every web browser and
mailer would require such a knowledge base for converting the romanizations
back to native characters for every language that they need to handle.

j. The process also lends itself to becoming completely and utterly
politicized.

k. Before even thinking about this, one would need to see a proof of
concept -- a round-trip mapping for a large percentage of Unicode letters to
and from [a-z, 0-9, and hyphen]; and for a large number of languages, not
simply Korean.

This is an interesting idea, but unfortunately simply won't work. If the
infrastructure is put into place to allow Unicode/ISO 10646 characters in
IDNs, then there is room for tools that transliterate arbitrary characters
into some readable representation in the user's native characters. But such
tools are optional, and do not need to have the anything like the degree of
precision (and the language coverage) required by your proposal.

Mark
----- Original Message -----
From: "DualName - ShimSungJae" <shimsungjae@dualname.com>
To: "Mark Davis" <mark@macchiato.com>; <idn@ops.ietf.org>; "Paul Hoffman /
IMC" <phoffman@imc.org>
Sent: Sunday, November 19, 2000 15:49
Subject: Re: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt


> Mark,
>
> Thank you for your comments. Please see below for my responses to your
> comments.
>
> Sung
>
> P.S. I have already responded to the comments provided by Mr. Paul
Hoffman.
>
> ----- Original Message -----
> From: Mark Davis <mark@macchiato.com>
> To: <idn@ops.ietf.org>; Paul Hoffman / IMC <phoffman@imc.org>
> Sent: Friday, November 17, 2000 12:37 AM
> Subject: Re: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt
>
>
> > I agree completely.
> >
> > a. There is no accepted set of rules for romanizations of all languages.
>
> Sung: That is one of the reasons why VIDN uses the phonemes as a medium of
> the transliteration. Phonemes are very universal, being applicable to any
> language. In fact, most transliteration schemes are based upon the
> systems of sounds of the respective two languages and the units of such
> systems are phonemes.


>
> > b. Moreover, to be useful according to the proposal, the romanization
> would
> > have to provide a "round-trip" mapping.
>
> Sung: Again, since VIDN uses the phonemes as a medium of the
> transliteration, a "two-way" mapping is possible. That is, VIDN
> transliterates between two languages using the phonemes that have the same
> or very proximate sounds.
>
> > c. Furthermore, the romanizations will be subject to accidental
collisions
> > between different scripts.
>
> Sung: Such collisions between different scripts may occur when the
different
> scripts are actually used and registered as internationalized domain
> names. Please note that VIDN do NOT create and register any
> internationalized domain name, BUT it allows using internationalized
domain
> names virtually. Thus, as long as the characters in different scripts
> represent the phonemes that have the same or very proximate sounds, VIDN
> returns the same characters in English.
>
> Sung: Also, since domain names in English already exist, conversion from
one
> local language into another local language can be done via English
> language. For example, a virtual domain name entered in Korean can be
> converted into the corresponding domain name in English, which can be also
> converted from another virtual domain name entered in Japanese. Using
domain
> names in English as liaison between virtual domain names in two local
> languages can minimize the possibilities for such collisions between the
two
> local languages.
>
> > d. And finally, the mechanisms for doing romanization need to be fairly
> > sophisticated. Look at ICU's, for example:
> >
> > http://oss.software.ibm.com/icu/userguide/Transliteration.html
> >
>
> Sung: "Sophisticated" does not necessarily mean "impossible." In fact,
VIDN
> uses the knowledge base of transliteration, which is very comprehensive,
if
> not complete. For example, several experts in Korean and English phonemics
> and linguistics have consulted in constructing the knowledge base of VIDN
> for Korean-English conversion. The knowledge base includes not only the
> general principles of transliteration but also common usages, idiomatic
> expressions, and possible variations that may occur in transliteration.
>
> > Mark
> >
> > (I'm replying to a slightly broader list on this message).
>
>
Prev by Date: Re: [idn] Re: draft-ietf-idn-vidn-00.txt
Next by Date: Fw: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt
Prev by thread: Re: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt
Next by thread: Fw: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt
Index(es):
- Date
- Thread