[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: Fwd: Unicode letter ballot



Soobok Lee said:

> If option A wins, I see a chaos:
>  1) NFKC_3.2_nameprep(NFKC_4.0_calling_application(IDN))  
>        !=  NFKC_3.2_nameprep(IDN)

It isn't as if you don't already have this problem in spades:

   NFKC_3.2_nameprep(AnyRandomNormalization_calling_application(IDN))
         !=  NFKC_3.2_nameprep(IDN)
         
You can't force the entire rest of the world to normalize their
data the same way you do for some set of protocols, simply because
other applications have other requirements for normalization
of Unicode data.

One way to control the *particular* problem of the CJK compatibility
character mappings pose in normalization is for China's and
Taiwan's domain registries to simply disallow use of these
characters in domain names. (Either just for the problematical
few where there are mapping errors, or for all of them -- which
would be my preference.) After all, these compatibility characters --
particularly the CNS, GB and KSX ones, are duplicates for round-trip
mapping purposes, but aren't really required for the level
of distinctions useful for defining domain names (or host names,
or any similar named entities that IETF is concerned about).

>     What if future HTML/canonical XML standards adopt NFKC_4.0 ?

It won't be NFKC in any case. It will be NFC, which is different,
anyway. (But which still has the problem of the erroneous
mappings for CJK compatibility characters.)

--Ken