[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration



I have tried to be silent all these while but I feel it is time for me
to speak up. Note that what I said is in my own capacity, and does not
represent the IDN WG, Marc (my co-chair), my company or whatever. It is
solely, and primarly my own opinion.

The problem of Traditional Chinese & Simplified Chinese cannot be
expressed as a bicameral (upper/lower case) problem. To say it is
similar to "A" and "a" is at best misleading. TC/SC is not a simple 1 to
1 mapping.

Neither can it be compared to other problems in some other languages.
Every language is unique, each of them face with problems that its
unique only to itself, as demostrated in some of the mails in this
thread. So lets please do away with any anology.

The complexity of TC/SC is not something can be explained in a single
word, but it is described quite well in the following articles:
    http://playground.i-dns.net/one/onec_sum.htm
    http://www.cjk.org/cjk/c2c/c2cbasis.htm.

In particular, pay attention to the second article and on the section of
"code-point substitution". Quote: "The easiest, but most unreliable, way
to convert SC to TC, or vice versa, is to do so on a
codepoint-to-codepoint basis".

Some subset of TC/SC problem can be solved using using this method,
however unreliable it is. But such solution have to take into
consideration of conflicting languages (such as Japanese & Korean) and
thus striping it down to a even smaller subset. In a JET meeting whereby
we do this investigation, this would bring the simple 2400+ TC/SC pairs,
down to around 300+ TC/SC pairs. (And note, (U+81FA, U+53F0) pairs, an
commonly used characters since it represent "tai" in "taiwan" is not
part of the sub-subset). One would therefore question the usefulness of
such partial solution.

There have been many attempts to solve the "TC/SC problems", and in
fact, some draft have been admitted to the group. Unfortunately, none of
them have managed to stand up to the scuruity of the working group.

In addition, the Unicode Technical Consortium have send the following
recommendation to the working group on TC/SC
    http://www.imc.org/idn/mail-archive/msg04005.html

Nevertheless, the "Chinese language" problem is one which we cannot
deny, altho it may not be as serious as others is painting it. But all
problems are serious, the degree of seriousness depends on the
individual perspective.

But the fact we talk about "language" is a wrong start. Human language
is complex. There are very few language rules that can be encapsulated
within computer software accurately. And we have to ask ourselves if the
expection that DNS is able to do so is fair.

This is why the group have steer away from "How do I put a Chinese words
into the domain name" problem and work on "How do I put a Han Ideograph
into the domain name". In short, we deal with scripts, and not
languages.

This is not to say we close our eyes and ears away from the "language"
problem. Once again, lets me remind everyone that every language have
its own problems so it is not just Chinese. Instead, we need a more
complex naming system, one that John have already explained in his mail,
for *all* language problems:
    http://www.imc.org/idn/mail-archive/msg05615.html

It is always easier to point out problems then to provide solution. This
is why the world have more movies critics then movies directors :-)
However, the world value solutions more than we value problems.

And when there is only one obvious solution, it does not mean the
obvious solution is the right one. As Dave points out, "When all you
have is a hammer, everything look like a nail".

The IDN WG is not absolute. It is a beginning of a journey to do I18N in
IETF and the Internet, one that can bring the Internet closer to the
world; OR it can be a end to any further attempts to do I18N in IETF.
Neither is it the only forum whereby all solutions to I18N is to be
conducted. Not all solutions are technical in nature, and even if it is,
not all problems can be solve within IETF.

ps: It is not my intention to be argumentive so this will be my first
and only mail on this thread.

-James Seng

----- Original Message -----
From: Erin Chen
To: Scott Bradner
Cc: deng@cnnic.net.cn ; Elisabeth.Porteneuve@cetp.ipsl.fr ;
mclaughlin@pobox.com ; ajm@icann.org ; alanysho@hkdnr.net.hk ;
christine.tsang@hkdnr.net.hk ; fred@cisco.com ; harald@Alvestrand.no ;
hlqian@cnnic.net.cn ; hoho@iis.sinica.edu.tw ; htk@eecs.harvard.edu ;
huangk@alum.sinica.edu ; iab@isi.edu ; idn@ops.ietf.org ; iesg@ietf.org
; jasonho@umac.mo ; jet-member@nic.ad.jp ; jseng@pobox.org.sg ;
klensin@jck.com ; lee@whale.cnnic.net.cn ; lynn@icann.org ;
mao@cnnic.net.cn ; Marc.Blanchet@viagenie.qc.ca ; mkatoh@mkatoh.net ;
mouhamet@next.sn ; narten@us.ibm.com ; nordmark@eng.sun.com ;
paf@cisco.com ; phoffman@imc.org ; qhhu@public.bta.net.cn ;
sharil@cmc.gov.my ; shkyong@kgsm.kaist.ac.kr ; snw@twnic.net.tw
Sent: Monday, February 04, 2002 6:41 PM
Subject: Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration


Dear Scott,
Before I describe how often are Chinese Names written using a mixture of
TC & SC, I explain how the user behavior and the current Input
MethodEnvironment of applicaiton which user would face to.1. In the user
behavior of written Chinese in ones daily life, it is    VERY often of
using mixture of TC & SC. I say VERY often here   because it is the user
written custom. It is difficult to have a    statistics to describe how
often.2. In the current Input Method Environment of applicaiton, user
would   face to a IME of could type mixture TC & SC very easily. And
because   of the custom of using mixture TC & SC, the boundary of TC &
SC is    becoming indistinct. It is also difficult to distinguish them
by user.Although it is difficult to give a general statics. But I can
offer somestatics of TWNIC idn.tw testbed. In our testbed there are
regist
erd 27,665 idn.tw Chinese domain name now. In TWNIC idn.tw testbed, we
have make a experiment on a TC & SC set(U+81FA, U+53F0). May be our
experimental statics can used as explainhow often. In the experiment we
allow registrants to choose one of the(U+81FA, U+53F0) if they need in
their Chinese domain name. In our statics, there are 2,311 Chinese
domain name choose TC U+81FA and 3,938 choose SC U+53F0In A Complete Set
of Simplified Chinese Characters, there are well defined more then 2,000
set of TC & SC which are frequently used in Chinese daily live. Erin
Chen
Scott Bradner wrote:

Assume Chinese name of 10 glyphs. Each one may take 2 versions,TC or SC.
how often are Chinese names written using a mixture of TC & SC?Scott