[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Comments on IDNA/stringprep/nameprep

To: "Kent Karlsson" <kentk@md.chalmers.se>, "'Erik Nordmark'" <Erik.Nordmark@eng.sun.com>
Subject: Re: [idn] Comments on IDNA/stringprep/nameprep
From: "Soobok Lee" <lsb@postel.co.kr>
Date: Wed, 13 Feb 2002 11:06:32 +0900
Cc: <idn@ops.ietf.org>
References: <000301c1b40d$b8bfda00$0100007f@chalmers95a69n>

Thanks, Kent.

----- Original Message ----- 
From: "Kent Karlsson" <kentk@md.chalmers.se>
 > 
> > > Even though e.g. [gg] and [g][g] (there are a few hundred other examples)
> > > are not canonically or compatibility equivalent, they still represent
> > > the same sequence of Hangul letters, and thus "mean" the same.
> > 
> > Yes, same argument is used for SC/TC needing to be addressed in IDN.
> 
> No, no, no!!  This issue is comparable to the *canonical* equivalences
> that already exist for Hangul syllable characters, and for other 
> characters that have a canonical decomposition (some "double latin
> letters" have compatibility decompositions, but the relationship here
> is much stronger; and it is much much stronger than case insensitivity).
> Unfortunately, due to historic events, that equivalence is no longer
> recorded in Unicode 3.0 and later property data.
> 
> This is in no way comparable to the SC/TC issue which is a spelling
> preference issue, where the "spellings" are actually different.
> Here it is just about the underlying representation for the **same**
> spelling (in terms of sequence of letters; there is not even any
> case difference or font variant difference [for correctly constructed
> fonts that cover Hangul]).
> 


True. the canonical equivalence between [gg] and [g][g] is defined in the
unicode 3.0 . They should have been unified by NFC, but haven't correctly.

Too late to be changed. and It should be solved in new normalizatio forms.
But If applications use the new normalization before nameprep,
As i warned  in the last call comments, the following condition will be
 trigerred silently,

  stringprep(newnormalization(Hangul)) != stringprep(Hangul) 

If stringprep would be neutral to new normalization adopted by applications,
 stringprep should be perfect and inclusive of all kinds of mature normaliztions,
that is,  the universal set of all kinds of normalizations built upon unicode.
Impossible?

Applications implementors should be cautious when applying normalizations forms
to data/texts portions that contain IDN. If some applications already adopted some
normalizations forms that are not compatible to stringprep as above,
backward compatibility requirements are not met in that case.
IDNA's backward compatibility claim doesn't come without costs.

Don't build our grand castle on the moving sand dune, on which a tiny tent is more adequate
and wise choice. :-)

Soobok Lee

Follow-Ups:
- Re: [idn] Comments on IDNA/stringprep/nameprep
  - From: "Mark Davis" <mark@macchiato.com>

References:
- RE: [idn] Comments on IDNA/stringprep/nameprep
  - From: "Kent Karlsson" <kentk@md.chalmers.se>

Prev by Date: Re: Inputting mixed SC/TC (Re: [idn] A question...)
Next by Date: Re: [idn] IDNA interoperability failures, once again
Previous by thread: RE: [idn] Comments on IDNA/stringprep/nameprep
Next by thread: Re: [idn] Comments on IDNA/stringprep/nameprep
Index(es):
- Date
- Thread