[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Comments on IDNA/stringprep/nameprep



The Unicode consoritium debated making the canonical decomposition
from <gg> to <g><g> for a long time. The deciding feedback was from
the Korean national body at the Seoul SC2/WG2 meeting, where they said
it should not be done; that it was akin to canonically decomposing "w"
to "vv". They also objected to combinations like <gs> being
canonically decomposed, principally so that modern syllables could
always be decomposed into 3 pieces. The (weaker) compatibility
decompositions in Unicode until the time that NFC was formed; those
were removed because they would have prevented the formation of Hangul
Syllables in NFKC.

Mark
—————

Γνῶθι σαυτόν — Θαλῆς
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

----- Original Message -----
From: "Soobok Lee" <lsb@postel.co.kr>
To: "Kent Karlsson" <kentk@md.chalmers.se>; "'Erik Nordmark'"
<Erik.Nordmark@eng.sun.com>
Cc: <idn@ops.ietf.org>
Sent: Tuesday, February 12, 2002 18:06
Subject: Re: [idn] Comments on IDNA/stringprep/nameprep


> Thanks, Kent.
>
> ----- Original Message -----
> From: "Kent Karlsson" <kentk@md.chalmers.se>
>  >
> > > > Even though e.g. [gg] and [g][g] (there are a few hundred
other examples)
> > > > are not canonically or compatibility equivalent, they still
represent
> > > > the same sequence of Hangul letters, and thus "mean" the same.
> > >
> > > Yes, same argument is used for SC/TC needing to be addressed in
IDN.
> >
> > No, no, no!!  This issue is comparable to the *canonical*
equivalences
> > that already exist for Hangul syllable characters, and for other
> > characters that have a canonical decomposition (some "double latin
> > letters" have compatibility decompositions, but the relationship
here
> > is much stronger; and it is much much stronger than case
insensitivity).
> > Unfortunately, due to historic events, that equivalence is no
longer
> > recorded in Unicode 3.0 and later property data.
> >
> > This is in no way comparable to the SC/TC issue which is a
spelling
> > preference issue, where the "spellings" are actually different.
> > Here it is just about the underlying representation for the
**same**
> > spelling (in terms of sequence of letters; there is not even any
> > case difference or font variant difference [for correctly
constructed
> > fonts that cover Hangul]).
> >
>
>
> True. the canonical equivalence between [gg] and [g][g] is defined
in the
> unicode 3.0 . They should have been unified by NFC, but haven't
correctly.
>
> Too late to be changed. and It should be solved in new normalizatio
forms.
> But If applications use the new normalization before nameprep,
> As i warned  in the last call comments, the following condition will
be
>  trigerred silently,
>
>   stringprep(newnormalization(Hangul)) != stringprep(Hangul)
>
> If stringprep would be neutral to new normalization adopted by
applications,
>  stringprep should be perfect and inclusive of all kinds of mature
normaliztions,
> that is,  the universal set of all kinds of normalizations built
upon unicode.
> Impossible?
>
> Applications implementors should be cautious when applying
normalizations forms
> to data/texts portions that contain IDN. If some applications
already adopted some
> normalizations forms that are not compatible to stringprep as above,
> backward compatibility requirements are not met in that case.
> IDNA's backward compatibility claim doesn't come without costs.
>
> Don't build our grand castle on the moving sand dune, on which a
tiny tent is more adequate
> and wise choice. :-)
>
> Soobok Lee
>
>
>
>