[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: Unicode is not usable in international context.

To: idn@ops.ietf.org
Subject: [idn] Re: Unicode is not usable in international context.
From: Alan Barrett <apb@cequrux.com>
Date: Thu, 21 Mar 2002 14:33:37 +0200
In-reply-to: <200203201557.AAA02490@necom830.hpcl.titech.ac.jp>
References: <2067.1016608872@brandenburg.cs.mu.OZ.AU> <200203201557.AAA02490@necom830.hpcl.titech.ac.jp>

On Thu, 21 Mar 2002, Masataka Ohta wrote:
> Unicode is not usable in international context. [...] Unicode is
> usable in some local context. [...] However, the context information
> must be supplied out of band.

Let me see if I can understand this argument about Unicode and local
context.  I am an English speaker who can't tell the difference between
the Chinese character that appears as the second character of the
Chinese word for the city that I call "Beijing", and the Japanese
character that appears as the second character of the Japanese word
for the city that I call "Tokyo".  I believe that (as used in the city
names) both characters mean something like the English word "capital".

Say there's a Chinese character that looks (to uneducated western eyes)
like a box with three legs and a hat, and a Japanese character that
looks (to uneducated western eyes) like a box with three legs and a hat.
Say the Chinese character looks slightly different from the Japanese
character, but a Chinese person can easily recognise the Japanese
character and understand its meaning in context, and a Japanese person
can easily recognise the Chinese character and understand its meaning in
context.

As far as I understand, Unicode would say that these are not two
different characters, but just different display forms of the same
unified character (or whatever the correct technical terms are).
Display software would have to have out of band knowledge to help it
choose between the Chinese and Japanese display forms.

As far as I understand, absence of out of band knowledge could lead to
the hypothetical Unicode character <CJK character that looks a bit like
a box with three legs and a hat> being displayed as if it were <Chinese
character that looks a bit like a box with three legs and a hat>, even
if the author's intent was to display <Japanese character that looks a
bit like a box with three legs and a hat>.

As far as I understand, Masataka Ohta considers this to be a fatal flaw
in Unicode.  I hope he will correct me if I have misunderstood his
objection.

I don't know enough to tell whether the difference between corresponding
Chinese and Japanese characters is analogous to a font difference or
a spelling difference, but the "ignorant westerner can't tell the
difference" test biases me towards the "font difference" side.  If they
are analogous to spelling differences, then I would say that unifying
the different characters was probably an error in Unicode, but that IDN
should not try to undo that unification.  Either way, I think that IDN
should document the potential problem but not try to fix it.

(In contrast, I think I have learned enough by following the past <n>
months of discussion to tell that the differences between Traditional
and Simplified Chinese are analogous to spelling differences, and so IDN
should not try to unify them.)

--apb (Alan Barrett)

References:
- [idn] Re: I don't want to be facing 8-bit bugs in 2013
  - From: Robert Elz <kre@munnari.OZ.AU>
- [idn] Re: I don't want to be facing 8-bit bugs in 2013
  - From: Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>

Prev by Date: Re: [idn] Re: I don't want to be facing 8-bit bugs in 2013
Next by Date: Re: [idn] WG last call summary
Previous by thread: Re: [idn] Re: I don't want to be facing 8-bit bugs in 2013
Next by thread: [idn] Re: I don't want to be facing 8-bit bugs in 2013
Index(es):
- Date
- Thread