[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] Normalisation and case folding (was: IDNA comment)
-----BEGIN PGP SIGNED MESSAGE-----
[Cross-posted from the IDN list; reply-to set to firstname.lastname@example.org.
Change it back for replies that relate specifically to IDN.]
Mark Davis wrote:
> >stringprep(NFC(x)) == stringprep(x) [does not always hold]
> This was brought up early in the Unicode 3.2 development. We have
> programmatically checked, and I with dot is the only case that causes
> a problem.
No it isn't:
= NFD("\u1F70\u03B9") = "\u03B1\u0300\u03B9" (alpha-varia, iota)
= NFD("\u03B1\u03B9\u0300") = "\u03B1\u03B9\u0300" (alpha, iota-varia)
(Which NF is used after toCasefold isn't important.)
What algorithm did you use to check this? In any case, I'll discuss case
folding in part 3 of my Unicode 3.2 comments, later today.
> I have no doubt that it will be resolved for U3.2, and even if StringPrep
> doesn't pick up U3.2, it could add a mapping to that one case.
How would you add a mapping from a sequence of two characters (e.g.
"I\u0307"), without changing the stringprep/nameprep algorithm, rather
than the tables?
(Actually there is an indirect way to do it: map out U+0307, and map all
composed characters with dot-above to the forms without dot-above. However,
that won't work for the standard case folding algorithm, because it would
break consistency with earlier versions, and it doesn't fix the problem
with Greek ypogegrammeni/prosgegrammeni, anyway.)
David Hopwood <email@example.com>
Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip
-----BEGIN PGP SIGNATURE-----
-----END PGP SIGNATURE-----