[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] utf8/legacy versioning




Thanks.
IANA maintains this charset list: http://www.iana.org/assignments/character-sets

But, it does not have any entry for
 revised legacy charset like "ks_c_5601-1992" nor "ksx1001:1992".

Moreover, It does not have "utf8" charset entry, because "utf8" is just one of the encodings of the Universal
Character Set, not an independent charset plus encoding like "ks_c_5601-1987".
Everyone knows UCS(ISo10646) and Unicode (UTC) changes and expands over time.
Does  UCS (ISO10646) versioning  strictly follow  Unicode Versioning (UTC) ?
Then, Why can't we see "utf8-3.1" or "utf8-3.2" for Unicode 3.1 and 3.2 recpectively?

Many applications performs CaseFold3.x(IDN) or NFKC3.x(CaseFold3.x(IDN)) or legacy2Unicode(IDN)
upon input texts or parameters and  tags their outputs with "encoding='utf8'". But this loose tagging
without the precise version of applied Unicode standard, will cause  interoperability
problems between the sending and receiveing application using different versions of
unicode standard. they will have different criteria and assumptions about being normalized or casefolded.

Loose versioning tradition/practice on  encodings of both of Unicode and Local char sets are so profound
and prevalent that we can't cure this situation in the foreseeable future.
I can't imagine all XML applications switch to "utf8-3.2" from "utf8".
Unicode and Legacy charsets are not designed to be used in rigorous identifier contexts, instead
primarily for textual applications or printer/display industries. that explains the origin of loose
versioning practice in UCS and local char sets.

Some application may adhere to this proposed precise versioning convention and may reflect the changes on
legacy and UCS mapping tables as frequently as possible. But, significant majority of
other applications would be unwilling to or unable or too late to do that. This situation cause
another interoperability problems among applications. Currently proposed IDN standard is
at best an experimental one and not adequate for any mission critial use.

Approximation and exception handling are inevitable in UCS/legacy handling, but it is not
allowed in universal identifier system like DNS. Rather, directory/search approach
would do better that job for internationalized *access* to domain names.

Soobok Lee


----- Original Message -----
From: "Keld Jørn Simonsen" <keld@dkuug.dk>
To: "Soobok Lee" <lsb@postel.co.kr>
Cc: <idn@ops.ietf.org>
Sent: Friday, May 31, 2002 2:29 AM
Subject: Re: [idn] Re: Legacy charset conversion in draft-ietf-idn-idna-08.txt (in ksc5601-1987)


> On Fri, May 31, 2002 at 12:56:05AM +0900, Soobok Lee wrote:
> > By "additions", i mean the required new tag for new version of legacy encoding, like "ks_c_5601-1992"
> > which should have been used, but never have been, as far as i know. Is there any central
> > registry that maintain the correct tag values for vaiour versions of numorous legacy encodings ??
> > If not, how to ensure stable and interoperable legacy-2-unicode conversion among myriads of applications ?
>
> IANA has a registry of charsets, and many of them have mappings defined
> for UCS. There is also an ISO register that has mappings between
> legacy charsets and UCS, available at
> http://www.dkuug.dk/cultreg/registrations/charmap
>
> Kind regards
> Keld