[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration



Soobok Lee stated:

> Unicode is not designed for identifiers use but rather for
> display or printing devices,  from the beginning. 

This is manifestly untrue. Unicode 1.0, published in 1991
specifically talks about "comparing text in operations such
as determining sort order of two strings, or filtering or
matching strings." And the earliest implementations of
Unicode, which were in development as early as 1990, were
already making use of Unicode strings as identifiers and
object labels. I know, because I was directly involved in
one such implementation.

It is true, however, that the Unicode Standard itself didn't
get around to making recommendations about how to deal
with Unicode identifiers until Unicode 2.0 in 1996. So
I can see how people might be confused about the design
intent.

> But,
> Unicode is ever evolving to expand its application areass.

The way I would put that is that more and more application
areas are coming to grips with the implications of working
with the Universal Character Set.

> It's astonishing Unicode has not yet any concrete lists of 
> TC/SC 1:1 and 1:n equivalences.

Partial lists have been available since Unicode 2.0, with the
first publication of Unihan.txt. And with each major or minor
version of Unicode, tremendous effort has been expended for
further refining and adding to the immense amount of information
provided in Unihan.txt about all the Han characters, their
sources, and variants. Unicode 3.2 will see another significant
step forward in the refinement of that information.

But the only thing "astonishing" here is that you find it
astonishing that no simple, complete TC/SC listing has yet been
compiled. The Unicoders in this discussion have been asserting
that the issue of "simplified" and "traditional" variants in
Han is enormously complex, and is not amenable to simple,
uncontextualized lists. Why then are you astonished that we
have not produced a simple, uncontextualized list?

You want a published list? Go buy Sanseido's Unicode Kanji
Information Dictionary. (ISBN 4-385-13690-4) Then look at
all the crossreferences (of distinct types) and the annotations
of simplified forms (kantaiji) and variant forms (itaiji).
Then tell me how long you think it would take to digest that
down to a consensus list that would be acceptable for IDN
use and which would provide user-tested "acceptability" for
both Chinese and Japanese users.

> I admit UNicode is the best solution as for now , but not the sufficienly
> mature solution  enough to serve the global language communities,
> especially chinese.

This is not an opinion apparently shared by the Chinese
government, which has recently mandated the use and implementation
of their latest national standard, GB 18030-2000, which contains
the exact same repertoire of Han characters as in Unicode 3.0
(and ISO/IEC 10646-1:2000), with all the same warts and variant
idiosyncrasies that you you claim renders Unicode an insufficiently
mature solution to serve the Chinese language community.
Who's right here?

> IDN deployment is not a reversible process.

That much I agree with. ;-)

--Ken