[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Comments on protocol drafts



> RJ Atkinson wrote:
> > All dialects of Chinese use the same written form,
> > mentioned primarily for those without familiarity
> > with Chinese.

> Almost the same.

> Hong Kong Cantonese have some special ideogram which exists in BIG5-HK but not
> in Unicode. Similarly, Taiwanese have some phonetic glyphs which exist in
> BIG5-TW but not in Unicode again. All of them are localised but very
> important, at least to those who use it in Taiwan and Hong Kong.

> Standard Mandarin have about few hundred thousand ideograms which does not
> exist on any commonly used character sets, including Unicode. Hopefully these
> can be taken care of in UCS-4.

Um, let's be careful here. The presence or absence of a given group of
characters is a coded character set (CCS) issue. My understanding is that work
is underway to define codepoints for these and many other missing characters in
the Unicode CCS, and hopefully ISO 10646 will adopt what the UTC does in this
regard.

Once the codepoints exist they are applicable to any character encoding scheme
(CES) that applies to Unicode/10646 and isn't limited to plane 0. This includes
UTF-8, UTF-16, UCS-4, UTF-7, CIDNUC, and probably UTF-5. The only CES I know of
that is limited to plane 0 is UCS-2, and nobody is talking about using UCS-2. 

In particular, you don't have to use UCS-4 to get the benefit of these upcoming
definitions. And while the number of planes you can get at does vary somewhat
between UTF-8, UTF-16, and UCS-4, this is an entirely academic point -- they
are all roomy enough to handle all future definitions.

> > I'll also note that ISO-8859 and UTF-8 do not support all European languages
> > equally well, nor does either support other Romanised non-European languages
> > (e.g. Vietnamese) equally well.

> Similar, there are other localized encodings used for European languages other
> than the standard locale ISO8859 and CP125X. It appears I18N remains an
> unfulfillable ideal whereby nothing right now can satisfy everyone. :-)

If you're talking about what's already standardized, then yes, I agree
with you. However, one of the main attractions of the Unicode/10646
CCS is that work is underway to try and make it a true I18N solution.

				Ned