[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt

To: FDU - Sung Jae Shim <sshim@mailbox.fdu.edu>
Subject: Re: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt
From: "Brian W. Spolarich" <briansp@walid.com>
Date: Wed, 22 Nov 2000 19:11:52 +0000 (GMT)
cc: idn@ops.ietf.org
Delivery-date: Wed, 22 Nov 2000 11:13:11 -0800
Envelope-to: idn-data@psg.com
On Mon, 20 Nov 2000, FDU - Sung Jae Shim wrote:

| Sung: Yes, VIDN maps the letters of non-English languages into
| [a-z,0-9, and hyphen] that already exist in the DNS, in the same way
| that people in regions where English is not widely spoken, currently
| create their domain names in English.

  So is it safe to say that VIDN is, in the context of
draft-ietf-idn-compare-01, an 'arch-3: Just send ACE' proposal?  And, in
the context of 'where to do IDN', VIDN proposes that the ACE encoding
happens in the application (as opposed to the resolver, recursive server,
authoritative server, or root server)?

  As such, I'm not sure I understand how VIDN is better than, say,
draft-ietf-idn-idnra, using any of the proposed ACE formats.  In the other
proposals, the encoding is done on the basis of mapping Unicode codepoints
to the RFC1035-mandated characters.  While the result is perhaps ugly, the
mapping is deterministic, straightforward, and can readily be processed by
applications for re-display in a localized script.  Indeed, that's one of
the assumptions behind the whole prefix issue:  by identifying the encoded
domain name with some (hopefully) unique affix, applications can display
the string in the original set of codepoints.

  In this context, the only significant difference thatn I see between
VIDN and other ACE proposals is that the VIDN-encoded strings can be
'read' by humans with some hope of being able to figure out what the
corresponding 'actual' domain name is.

| Sung: VIDN does not need round-trip mapping, although it may be
| possible to convert characters from English back to local languages.
| What is the use of this reversed conversion? Is it for those who speak
| English, so that they can use English to go to domain names registered
| in local languages? In VIDN, there are no domain names created and
| registered in local languages. Those who speak English do not need
| domain names in local languages. Please do not forget that domain
| names in local languages are for those who do not speak English, not
| for those who speak English. Again, VIDN does not create and register
| domain names in local languages, and VIDN needs only domain names in
| English actually exist as in the current DNS.

  I think perhaps you're confusing 'language' with 'script' here.  VIDN
does not propose to represent domain names in English.  The string
'jungang.com' is not an English-language sequence, but an imperfect
representation of a Korean-language set of phonemes in English.

  Paul Hoffman's document draft-hoffman-i18n-terms-00 provides a good
discussion of some of these distinctions.  

  The general assumption of other proposals has been on representing
scripts in the DNS, typically using some variant or encoding of the
Unicode-3 or ISO:IEC-10646 character set.  To be honest, it never occurred
to me to even think about higher-level 'language'-oriented approaches,
because this domain is much less well-defined.  

  Your arguments for this proposal include a statement that VIDN is the
way to go because other methods involve a 'lengthy and costly process of
implementation'.  I'm don't believe this is an accurate statement
(depending on what you mean by 'lengthy' or 'costly'), or that VIDN
proposes a better alternative.  In the case of a script-based approach to
IDN, there are extremely well-defined standards (Unicode and
ISO/IEC-10646) for encoding an extremely comprehensive set of characters
in a standard way.  Indeed the ISO and Unicode work in this space has been
in process for nearly 20 years.  In contrast, your proposal suggests that
the basic for encoding should be a 'language X' to English
transliteration, for which there are no common standards, much less mature
ones like Unicode.  

  In terms of cost, I'm not sure I understand how any ACE-based approach
is going to cost more or less than any other.  Once the algorithm is
defined and agreed-upon, writing implementation code is pretty
straightforward.

  Also, some phonemes aren't even directly representable in the RFC1035
character set (the click sounds in some African languages, for example are
usually represented with an exclamation point '!').

| > b. There are no transliteration standards for many, many languages.
| >
| 
| Sung: Without such standards, people speaking local languages have
| been creating and registering domain names in English. The most common
| way to create domain names in English is to transliterate the
| characters in local languages into the characters in English that have
| the same or proximate sounds. VIDN uses the knowledge of this
| transliteration based upon the sound or phonemic systems of the
| respective local language and English. Please take a look at how those
| domain names in English have been created in regions where English is
| not widely spoken, without such standards.

  If I correctly interpret your argument, you're saying "People are
already doing something like VIDN today, so we should just formalize
it."  I'm not sure how compelling of an argument that is.

  Given that the focus of IDN should be to seamlessly enable resolution of
multilingual names for end-users, I'm not sure I understand what
particular value the VIDN-proposed encoding scheme has over any other.  If
the end-user is presented with a VIDN-encoded name, but isn't able to read
the name as represented in ASCII characters and English syllables, what
is the value of it? 

| Sung: Because of its small size (e.g., the testing version of VIDN for
| Korean-English conversion is about 800KB and the actual DLL file used
| for the conversion is about 250KB), VIDN can be easily embedded into
| user programs that use domain names, such as web browser and client
| email software. Alternatively, the knowledge base of conversion and
| the logic to process it can be embedded into operating systems as a
| library, so that client software such as web browser and email
| software can share them. The user will need only the module for
| conversion of his or her preferred local language into English. Again,
| there is no need to convert the romanizations back to native
| characters for every language.

  Why do this in the application?  That will require each and every
application be modified for IDN.  Why not do this in the resolver and
worry about fixing particular applications that get in the way?

  -brian
Prev by Date: [idn] I-D ACTION:draft-ietf-idn-dude-00.txt
Next by Date: [idn] draft-ietf-idn-nameprep-00.txt
Prev by thread: Re: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt
Next by thread: Re: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt
Index(es):
- Date
- Thread