[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: Fwd: Unicode letter ballot



On Wed, Nov 27, 2002 at 05:52:38PM -0800, Kenneth Whistler wrote:
> Soobok Lee said:
> 
> > If option A wins, I see a chaos:
> >  1) NFKC_3.2_nameprep(NFKC_4.0_calling_application(IDN))  
> >        !=  NFKC_3.2_nameprep(IDN)
> 
> It isn't as if you don't already have this problem in spades:
> 
>    NFKC_3.2_nameprep(AnyRandomNormalization_calling_application(IDN))
>          !=  NFKC_3.2_nameprep(IDN)
>          
> You can't force the entire rest of the world to normalize their
> data the same way you do for some set of protocols, simply because
> other applications have other requirements for normalization
> of Unicode data.

Yes. Those conflicts of different normaliations were already discussed
here and they will remains as inherent and unsolvable problems in 
i18n identifiers works. So, the above problem is nothing new in this WG.

> 
> One way to control the *particular* problem of the CJK compatibility
> character mappings pose in normalization is for China's and
> Taiwan's domain registries to simply disallow use of these
> characters in domain names. (Either just for the problematical
> few where there are mapping errors, or for all of them -- which
> would be my preference.) After all, these compatibility characters --
> particularly the CNS, GB and KSX ones, are duplicates for round-trip
> mapping purposes, but aren't really required for the level
> of distinctions useful for defining domain names (or host names,
> or any similar named entities that IETF is concerned about).

Right.  That means any CJK IDN admin guideline efforts should deal with 
un-nameprepped CJK inputs in addition to nameprepped ones, in order to 
fine tune their CJK equivalence tables based on regional languages. 
that is , they should embrace those 5 wrong canonical CJK equivalences
and all their transitional ones as if they were legitimate ones.
But, even in those cases, local CJK identifier comparisons without DNS 
queries will fail to match regardless of those CJK domains are owned by 
a single registrant. of course, this issue is also nothing new in this WG.

And, Chinese or Korean IME(input method editor) injects those
compatibility characters into protocols and end users often does not
know whther their input characters are compatibility CJKs or not.
They look exactly identical to their canonical equivalent CJKs.

> 
> >     What if future HTML/canonical XML standards adopt NFKC_4.0 ?
> 
> It won't be NFKC in any case. It will be NFC, which is different,

Right. It is NFC which is adopted , not NFKC. Thanks.

> anyway. (But which still has the problem of the erroneous
> mappings for CJK compatibility characters.)
> 
> --Ken