[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] Canonicalization: [28] through [31]



> At 17.27 +0200 00-06-27, Karlsson Kent - keka wrote:
> >For character strings
> >where all the characters are in a 3.0 or post-3.0 version
> >of Unicode, the normal forms D, C, KD, and KC of the string
> >will not change.
> 
> That is a VERY strong statement, and I hope you are true. I really do...
> 
> For the very reasons on comparisons I listed in previous email.
> 
>    paf
> 

The gatekeepers are well aware of the issue and are committed
to preserving the stability of normalized data that has been normalized 
according to the specification of UAX #15 Unicode Normalization Forms.

The Unicode Technical Committee has formally made UAX #15 a part
of the Unicode Standard, which means, among other things, that it
must be taken into account when considering all future extensions
of the standard.

Granted that there is a delicate balancing act going on here for
encoding new characters in the standard, with JTC1/SC2/WG2 responsible
for ISO/IEC 10646 and with the UTC responsible for the Unicode
Standard. But neither committee now takes any step without careful
heed of what the other committee is doing -- in great detail. This
is one of the best examples I know of in all of IT standardization work
of two independent standardization committees cooperating and
coordinating their work. And the recent, near-simultaneous publication
of ISO/IEC 10646-1:2000 and of the Unicode Standard, Version 3.0
with identical repertoires, encodings, and glyph representations
should give everyone else in the industry some significant "warm fuzzies"
about the seriousness with which these committees consider their
synchronization.

WG2 is on notice that normalization considerations must now be
taken into account when evaluating new character encoding proposals.
Members of the UTC have an action item to write up a formal statement
of the impact of normalization on character encoding to be
incorporated into the Principles and Procedures document which guides
WG2's encoding decisions.

Furthermore, while encoding decisions can impact normalization, the
most significant impacting factor is the list of canonical and
compatibility mappings contained in the Unicode Character Database,
and used by the Normalization algorithm specified in UAX #15.
Those mappings are under the sole control of the Unicode Technical
Committee, and everyone on that committee understands that willy-nilly
changes in the canonical and/or compatibility mappings that would
result in the retroactive illegitimation of existing normalized
Unicode data are henceforth strictly prohibited.

Of course nothing in life or technology is certain -- but at the
moment this issue is about as strongly wired and under control as
we can conceivably make it.

--Ken Whistler, Technical Director, Unicode, Inc.