[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration




The test data, as you describe it, is completely insufficient to draw any conclusions. The purpose of a test would be to determine whether the combinatorics of mixing TC and SC characters were an real issue, not simply whether people choose a particular SC vs. TC character.
 
To do an accurate survey, you would have to be much more rigorous, doing at least the following:
 
- Divide all CJK characters into three classes: TC-only, SC-only, TC-or-SC.
Post this list so that it can be scrutinized by others.
 
- Test all names to see which contain at least one character from TC-only and one from SC-only.
Post that list so that it can also be scrutinized by others.
 
- Of those, see how many would match under a TC-SC mapping.
Publicize both the mapping and the list so that they can also be scrutinized by others.
 
- See what the percentages are: mixed / total and matching / mixed.
 
Note that the accuracy of the test also depends heavily upon the accuracy of the division of characters in the original three classes, and the accuracy of the TC-SC mapping.
 
Mark
—————
 
Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]
 
http://www.macchiato.com
----- Original Message -----
From: Erin Chen
Sent: Monday, February 04, 2002 02:41
Subject: Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration

Dear Scott,
Before I describe how often are Chinese Names written using a mixture of 
TC & SC, I explain how the user behavior and the current Input Method
Environment of applicaiton which user would face to.

1. In the user behavior of written Chinese in ones daily life, it is
VERY often of using mixture of TC & SC. I say VERY often here
because it is the user written custom. It is difficult to have a
statistics to describe how often.
2. In the current Input Method Environment of applicaiton, user would
 face to a IME of could type mixture TC & SC very easily. And because
of the custom of using mixture TC & SC, the boundary of TC & SC is
becoming indistinct. It is also difficult to distinguish them by user.

Although it is difficult to give a general statics. But I can offer some
statics of TWNIC idn.tw testbed. In our testbed there are regist erd
27,665 idn.tw Chinese domain name now.

In TWNIC idn.tw testbed, we have make a experiment on a TC & SC set
(U+81FA, U+53F0). May be our experimental statics can used as explain
how often. In the experiment we allow registrants to choose one of the
(U+81FA, U+53F0) if they need in their Chinese domain name. In our
statics, there are 2,311 Chinese domain name choose TC U+81FA and
3,938 choose SC U+53F0

In A Complete Set of Simplified Chinese Characters, there are well
defined more then 2,000 set of TC & SC which are frequently used in
Chinese daily live.

Erin Chen
Scott Bradner wrote:
200202031600.g13G07n19482@newdev.harvard.edu type="cite">
Assume Chinese name of 10 glyphs. Each one may take 2 versions,
TC or SC.

how often are Chinese names written using a mixture of TC & SC?


Scott