[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: Chinese Domain Name Consortium (CDNC)Declaration

To: Erin Chen <erin@twnic.net.tw>, Scott Bradner <sob@harvard.edu>
Subject: Re: [idn] Re: Chinese Domain Name Consortium (CDNC)Declaration
From: John C Klensin <klensin@jck.com>
Date: Mon, 04 Feb 2002 08:35:28 -0500
Cc: deng@cnnic.net.cn, Elisabeth.Porteneuve@cetp.ipsl.fr, mclaughlin@pobox.com,ajm@icann.org, alanysho@hkdnr.net.hk, christine.tsang@hkdnr.net.hk, fred@cisco.com,harald@Alvestrand.no, hlqian@cnnic.net.cn, hoho@iis.sinica.edu.tw, htk@eecs.harvard.edu,huangk@alum.sinica.edu, iab@ISI.EDU, idn@ops.ietf.org, iesg@ietf.org, jasonho@umac.mo, jet-member@nic.ad.jp,jseng@pobox.org.sg, lee@whale.cnnic.net.cn, lynn@icann.org, mao@cnnic.net.cn,Marc.Blanchet@viagenie.qc.ca, mkatoh@mkatoh.net, mouhamet@next.sn, narten@us.ibm.com, nordmark@eng.sun.com,paf@cisco.com, phoffman@imc.org, qhhu@public.bta.net.cn, sharil@cmc.gov.my,shkyong@kgsm.kaist.ac.kr, snw@twnic.net.tw
In-reply-to: <3C5E6549.50309@twnic.net.tw>
References: <3C5E6549.50309@twnic.net.tw>

Scott (and others),

Let me see if I can make an analogy that explains the way I
understand this problem.

Erin (and others), 

If I have this wrong, please correct me --I still have a great
deal to learn.  If I do not, please think about the examples and
cases given, as they may help to explain why many of the users
of alphabetic languages have been reluctant to put TC<->SC
mapping into nameprep or some other IDNA step.

---------

As you know, there have been several attempts to "reform"
English spelling in the last couple of centuries.  The intent
has been to remove the silent characters, reduce idiosyncracies
due to the different language of origin of different words, to
get the spellings into a purely phonetic form or some
combination of them.   Few of these have gone anywhere, but
suppose that a system that rationalized only the thousand most
frequently used words had been adopted in the US and was in
moderately wide use, and that the change had been made within
the lifetime of many, or most, of English-users in the US.

Now, what we would end up with is an odd set of combinations.
Many words within that thousand would not be changed (how could
you simplify "a" or "is" ?).  Others would exist in post-change
form in ordinary writing, but in pre-change form in, e.g., names
and trademarks of companies that had existed and been in wide
use pre-change (e.g., I can't imagine a well-know breakfast food
firm being happy trying to change all of its corporate and
advertising materials to "Kelogz") and in older texts.   The
change would be slower, or would not be adopted at all, in other
English-speaking countries, with enclaves of English speakers in
non-English-speaking countries making the change most slowly of
all.  But all users of English would need to get used to seeing,
or puzzling out, the variants or they would discover that their
ability to read the language --as written-- would deteriorate
over time.  Would these things get mixed?  Of course they would:
To use an example I have used before in different form, consider
the well-known firm
   toz-(Cyrillic Ya)-us
and its likely insistence that, somehow, 
   toys-r-us
   toys-(Cyrillic Ya)-us
   toz-r-us
and maybe even
   toysrus
and that set of four additional variations.

Now, that is a case in which there are _only_ two words that
have variations and only two cases for each word.   But suppose,
as would be reasonable once one started down this slippery
slope, one also expected British and American spellings to
match: after all, the American spellings are just
simplifications from an earlier time.  And assume that many
names were more complex compounds than just having three
components.  So now there are three variations for many words
(raising the multiplier for multiple-word phrases) and phrases
with more words (creating a larger multiplier).   

And all of that assumes that the company doesn't get it into its
head to want toys.r.us and its variants, and that a "toz" TLD
isn't created in which they would, presumably, want to be
"us.r.toz".  This is a different case, in that no one would
expect (I hope) "us.r.toz" to match "toys-(Cyrillic Ya)-us", but
consider the implications to the administrative hierarchy.

And, of course, the new spelling system, unless it is much more
radical than the examples I have given above, introduces some
new homographs and other ambiguities.  E.g., does the simplified
"toz" match the traditional "toys", or "toes", or maybe even
"togs", or the made-up traditional word "toz"?  Or perhaps
several of them?  Does the present tense of "to read" get mapped
into "reed" while the past tense gets mapped into "red"?  And,
if so, what happens to the plant and the color (colour? colur?
:-( ).

Now, if you treat each "word" or "spelling variation" as if it
were a single character, you probably have a first-order
approximation to the Chinese situation.  And, to a greater or
lesser extent, the following statements are all true:

	(i) Modulo that ASCII case-matching rule, the DNS
	matches on code points, not on some abstraction of what
	a "character" is about, so this discussion is irrelevant.

	(ii) Things that people (end-users, not just computer
	geeks) see as the same ought to be treated as the same,
	or we will have vast confusion.

	(iii) Since, in Chinese, the things we see as "words" in
	English are really single characters, one ought to be
	able to do the extended mapping and matching things we
	now do for "case" in ASCII, and for case and other
	simplifications that are proposed for Nameprep, for
	these simplifications/variations also... even if it
	clearly is not compatible with the DNS to do them for
	English word-spelling variations.

	(iv) If we need to do this for Chinese, then
	"simplifications" that have been introduced through the
	years in "Latin" alphabets ought to match as well.  E.g.,
	    "IVLIVS" ought to match "julius"
	right?

Obviously (I hope) these principles are inconsistent.  And many
people are going to be upset with any decision that is made.  In
particular, if simplification-matching is not going to be
permitted for Chinese, then it is possible that some of the
mappings that are now made in Nameprep should be eliminated
because they are a bit too close to that (or to (iv) above) to
be comfortable.

    john

--On Monday, 04 February, 2002 18:41 +0800 Erin Chen
<erin@twnic.net.tw> wrote:

> Dear Scott,
> 
> Before I describe how often are Chinese Names written using a
> mixture of  TC & SC, I explainhow theuser behavior and the
> current Input Method Environment ofapplicaiton which user
> would face to.
> 
> 1. In the user behavior of writtenChinese in ones daily
> life,it is     VERY often of using mixture of TC & SC. I say
>...

Follow-Ups:
- Re: [idn] Re: Chinese Domain Name Consortium (CDNC)Declaration
  - From: "xiang deng" <deng@cnnic.net.cn>

References:
- Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration
  - From: Erin Chen <erin@twnic.net.tw>

Prev by Date: [idn] Fw: Cyrillics - Latin
Next by Date: Re: [idn] Fw: Cyrillics - Latin
Previous by thread: RE: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration
Next by thread: Re: [idn] Re: Chinese Domain Name Consortium (CDNC)Declaration
Index(es):
- Date
- Thread