[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt



Harald,

Thank you for your second set of comments. Please see below for my responses
to your comments.

Sung

----- Original Message -----
From: Harald Alvestrand <Harald@Alvestrand.no>
To: FDU - Sung Jae Shim <sshim@mailbox.fdu.edu>
Cc: <idn@ops.ietf.org>
Sent: Monday, November 27, 2000 12:02 AM
Subject: Re: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt


> I have tried to read your draft (again). It is problematic because you
> chose to represent non-ACII characters as "??" rather than as character
> names or Unicode codepoints.
>

Sung: As you may know, non-ASCII characters are not allowed in
Internet-Drafts. I think that we have to resolve this issue inside IETF
soon. I will forward to you an electronic copy of encoded text of the draft.

> Your specification does not say where the "pre-assigned codes" are stored,
> transferred and checked. This is critical to evaluating your draft.
>
> Please revise.
>

Sung: I also feel that the draft does not describe enough about the
one-to-one mapping scheme. I am in the process of revising the draft,
incorporating more details about the one-to-one mapping scheme. For now,
please
see below.

Sung: The code can be as simple as the one shown in the following example.
One simple coding can be the Unicode of the virtual domain name represented
in a local language. A server with a virtual domain name "x.y.z" will store
the
corresponding Unicode of "x.y.z" in the server. A client can verify, when a
user types "x.y.z" on the client side, whether it accessed the right server
or not by examining the code it retrieved from servers. Since the Unicode of
"x.y.z" that user typed on the client can be easily generated, it can be
compared to the Unicode retrieved from the server, and VIDN can immediately
determine whether it hits the correct server or not.

Sung: The code does not have to be stored in any specific format, but any
document format that is supported and understood by both client and server.
This means that the code can be embedded in XML, HTML, WML, etc. as long as
the client can interpret the retrieved code correctly. Likewise, VIDN does
not
require any specific intermediate transport protocol such as TCP/IP. The
only requirement is that the protocol must be understood among all
participating clients and servers.

Sung: The codes may be administered by a standard body such as IANA. Or the
codes for each local language may be administered by a local standard body
in regions where the local language is widely spoken, for example, KrNIC for
Korean language, JpNIC for Japanese language, and so.

> One point:
>
>     First, each entity-defined portion of a virtual domain name in the
>     local language is decomposed into individual characters or sets of
>     characters so that each individual character or set of characters can
>     represent an individual phoneme of the local language, which is the
>     inverse of transcription of phonemes into characters. Second, each
>     individual phoneme of the local language is matched with an
>     equivalent phoneme of English that has the same or most proximate
>     sound. Third, each phoneme of English is transcribed into the
>     corresponding character or set of characters in English. Finally, all
>     the characters or sets of characters converted into English are
>     united to compose the corresponding entity-defined portion of an
>     actual domain name in English.
>
> This process is severely underdefined; English is not a good language for
> finding systematic phoneme representations.
>

Sung: English language is not that bad. The testing result of VIDN for
Korean-English conversion is excellent, and the responses from those who
tried the program and responses in the Korean press have been very positive.
I have already done feasibility studies of VIDN with several experts in
Japanese-English phonemics and linguistics, and the results show that
Japanese language is much more straightforward in converting to and from
English following the principles of VIDN.

> If you start off with an English word and do this, you will usually end up
> with a different word, and sometimes this word will be English.
>

Sung: VIDN can handle that problem, using the one-to-one mapping scheme
described above.

> Consider that the pronounciation of "bridge" is (roughly) "bri-tsch". And
> that "cite" and "sight" have the same prononunciation in English
(homonyms).
>

Sung: In the example of "cite" and "sight" that have the same pronunciation,
let's assume that "sai-t" is the scripts in non-ASCII format that correspond
to "cite" and "sight" in ASCII format, and that "cite" in ASCII format has
the code matching with the code of "sai-t" in non-ASCII format. When a user
enters "sai-t" in non-ASCII format, VIDN connects to "cite" as it has the
matching code, and lists "sight" as an alternative as it does not have the
matching code. If the holder of "sight" wants his or name to be the first
choice, he or she may register "sight2" in ASCII format with the code that
matches with the code of "sai-t2" in non-ASCII format (because "sai-t" in
non-ASCII format already matches with "cite" in ASCII format). When a user
enters "sai-t2" in non-ASCII format, VIDN connects to "sight2" as it has the
matching code, and lists "cite2", if any, as an alternative as it does not
have the matching code.

> If you do this to a language using a Latin-based script, the result is
even
> more confusing; "skjerm" in Norwegian is pronounced almost as if it was
> "charm" in English. And there are the homographs - "lever" (liver) and
> "lever" (alive) are pronounced differently (lev-ER and LE-ver
respectively).
>

Sung: VIDN is based upon pronunciations and phonemics, not meanings and
semantics. Those languages using Latin-based scripts would not need the
conversion as much as those languages using non-Latin scripts, since most of
their scripts are already representable in ASCII format.

Sung: Anyway, in the example of  "skjerm" in Norwegian and "charm" in
English that have the same pronunciation, when a user enters "skjerm" in
non-ASCII format, VIDN connects to either "skjerm" in ASCII format or
"charm" in ASCII format, based upon the result of the one-to-one mapping
scheme, and lists the remaining one as an alternative. In either case, a
user can still use "skjerm" in ASCII format and "charm" in ASCII format. In
the example of "lever" (live) and "lever" (alive), with only the scripts
"lever" out of the context, there is no way to distinguish "lever" (liver)
and "lever" (alive).

> I do not understand how a process of "disambiguation" that cannot be made
> to work with any pair of languages I understand is going to help much with
> the problems I don't.
> But I could be wrong.
>

Sung: I think that the conversion method used in VIDN is generally
applicable to any two languages, although it focuses on the conversion
between a local language and English language for IDN.

>
> --
> Harald Tveit Alvestrand, alvestrand@cisco.com
> +47 41 44 29 94
> Personal email: Harald@Alvestrand.no
>
>