[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration

To: hoho <hoho@iis.sinica.edu.tw>
Subject: Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration
From: "John H. Jenkins" <jenkins@apple.com>
Date: Thu, 7 Feb 2002 10:28:18 -0700
Cc: Kenneth Whistler <kenw@sybase.com>, idn@ops.ietf.org
In-reply-to: <3C62195C.A6144C8@iis.sinica.edu.tw>

On Wednesday, February 6, 2002, at 11:06 PM, hoho wrote:

>
> This is true for TC/SC problem. But, the variant problem roots in the
> long history of Han characters. You might be interested in taking a
> peek at the problem at one of the Chinese or Japanese dictionaries on
> Han variants.
>

As Ken well knows.  It's a nasty, nasty problem in general, and one
which
Unicode is trying to address.  At its last meeting the UTC agreed to
request that the IRG start to work on definitive variant data for the
CJK
repertoire of Unicode/10646.

If there is any body in the IT standardization community which should
own
the problem, it's the IRG.  It's international, multilingual, and has
the
longest and best experience with the characters involved.

>>
>>   B. The move to Unicode implementations means that mingling
>>      of traditional and simplified orthographies is easier.
>>      In effect, users now have the rope to hang themselves,
>>      if they so desire. Whereas, before, the constraints of
>>      the deployment of IME's and fonts generally meant that
>>      you couldn't easily mix SC/TC, even when the code page
>>      nominally supported it.
>>
>
> I would accept A with minor modification as follows. B is not true.
>

I would disagree here.  As you point out, Windows 2000 has the ability
to
mingle traditional and simplified orthographies because of Unicode 2.0.
Mac OS X now has the problem, too, because of Unicode 3.1.  In the past
on
the Mac, if you wanted to do Japanese you used the Japanese "code page,"
if you wanted to do traditional Chinese, you did the traditional Chinese
"code page," and if you wanted to do simplified Chinese, you used the
simplified Chinese "code page."  This inherent link between code page
and
nuanced version of a script was something Unicode intended to break.  In
the past, one had to deliberately work to mix the two, and the mixture
was
generally obvious.  This isn't true now.

>
> My personal opinion after consulting several experts in Han characters
> is to find an international organization, e.g., Unicode Consortium, to
> host the standardization of variants. They are also willing to
>  collaborate
> with  CJK experts from other countries and regions. Some of their
> suggestions are described in the "phased implementation" draft.

Again, this is really the IRG's job.  The main problem with handing it
off
to the IRG is that they are not the fastest-moving standards body in
history.

Another part of the problem is that the data *is* out there.  Ken
mentions
the Sanseido dictionary.  I've got a CD of data from Taiwan on Chinese
variants, and the HKSAR government has a Web page.  I know that MS has
data they're sitting on which is used in Office.  Unicode's data has
been
derived in the past by character set analysis, but we are currently
working on incorporating data from a commercial product, Wenlin, which
has
been donated to us.

Unfortunately, most of the people who have spent the time and energy to
develop this data don't want to contribute it to the public gratis,
which
is what would happen if it went into Unicode.  Even if we simply said
that
we wanted to take over the data from an authoritative source (Sanseido,
the Hanyu Da Zidian), we'd probably have to get legal permission to do
so.

And there's the additional problem that nobody has a clear model, at
least
that I've seen, that captures the full plethora of variant "types" with
a
reasonable taxonomy and sufficient clarity that it can be used in
simple,
algorithm-generated, lexical-analysis-free situations.

> Without a internationalized dictionary for Han variants, the current
> IDN proposals are bringing side-effects to users and holders of
> domain names of Han characters.
>

I agree with Ken that it's not clear how *anything* could both include
hanzi in IDN *and* be free from unpleasant side effects.

>
> I appreciate your loving for Japanese. But, if you do care about
> Japanese, why did you ignore Chinese and Taiwanese, and perhaps
> even more silent Japanese and Korean out there?
>

I'm sure it wasn't intentional.  I believe that Ken's point is that you
can't do a Chinese-only solution here.  (And I'll naturally bring up the
Cantonese speaking population of the world, as just adding to the mess.)

==========
John H. Jenkins
jenkins@apple.com
jenkins@mac.com
http://homepage.mac.com/jenkins/

Follow-Ups:
- Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration
  - From: hoho <hoho@iis.sinica.edu.tw>

References:
- Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration
  - From: hoho <hoho@iis.sinica.edu.tw>

Prev by Date: [idn] Re: Unicode and Security
Next by Date: Re: [idn] Alpha Online
Previous by thread: Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration
Next by thread: Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration
Index(es):
- Date
- Thread