[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Normalisation and case folding (was: IDNA comment)

To: unicode@unicode.org, idn@ops.ietf.org
Subject: [idn] Normalisation and case folding (was: IDNA comment)
From: David Hopwood <david.hopwood@zetnet.co.uk>
Date: Tue, 12 Feb 2002 08:52:19 +0000
References: <0c1f01c1b31a$b8b9caf0$2b19fea9@temp> <004101c1b31d$7ecd45d0$08d8ea0c@c1340594a>
Reply-to: unicode@unicode.org

-----BEGIN PGP SIGNED MESSAGE-----

[Cross-posted from the IDN list; reply-to set to unicode@unicode.org.
Change it back for replies that relate specifically to IDN.]

Mark Davis wrote:
> >stringprep(NFC(x)) == stringprep(x) [does not always hold]
> 
> This was brought up early in the Unicode 3.2 development. We have
> programmatically checked, and I with dot is the only case that causes
> a problem.

No it isn't:

    NFD(toCasefold("\u1FB2"))
        = NFD("\u1F70\u03B9")       = "\u03B1\u0300\u03B9" (alpha-varia, iota)
    NFD(toCasefold("\u1FB3\u0300"))
        = NFD("\u03B1\u03B9\u0300") = "\u03B1\u03B9\u0300" (alpha, iota-varia)

(Which NF is used after toCasefold isn't important.)

What algorithm did you use to check this? In any case, I'll discuss case
folding in part 3 of my Unicode 3.2 comments, later today.

> I have no doubt that it will be resolved for U3.2, and even if StringPrep
> doesn't pick up U3.2, it could add a mapping to that one case.

How would you add a mapping from a sequence of two characters (e.g.
"I\u0307"), without changing the stringprep/nameprep algorithm, rather
than the tables?

(Actually there is an indirect way to do it: map out U+0307, and map all
composed characters with dot-above to the forms without dot-above. However,
that won't work for the standard case folding algorithm, because it would
break consistency with earlier versions, and it doesn't fix the problem
with Greek ypogegrammeni/prosgegrammeni, anyway.)

- -- 
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBPGjXmzkCAxeYt5gVAQFj1wgAiC4pPi3sJqK6uhOdXrigiUi85QMds7+o
LRgvpGldJ1l+LmuTh7PHqlq/rW5A+mvq/Usm0Gj9rZK0ALyc1i6nKvCN1hUPTGEA
cnCGR24aCqXa1aBNVEDT2FfY4QlJqiRBNjPxncMm3Od6SA3EN0cI76jUTXgk3YxV
S/Ffd2eszm3jy4qeBIkgkXhul7mKxonwdzmGggGLxAj25RNbzzoBAiGmtH2NQn/C
IZExFrQjFXGNwLQ7wjhbRSs1nWwRYP0OcJIHEACSWcf/tYu+opLB6Dcq3ZXAk20y
P4a/c8KvryTd6ZF7d+8sV3x4yCzEh5PzPjgeYRqv0lbAMCsO3bTasw==
=pR5H
-----END PGP SIGNATURE-----

References:
- [idn] IDNA comment 1 : applications' own normalization vs stringprep
  - From: "Soobok Lee" <lsb@postel.co.kr>
- Re: [idn] IDNA comment 1 : applications' own normalization vs stringprep
  - From: "Mark Davis" <mark@macchiato.com>

Prev by Date: Re: Inputting mixed SC/TC (Re: [idn] A question...)
Next by Date: [idn] IDNA interoperability failures, once again
Previous by thread: Re: [idn] IDNA comment 1 : applications' own normalization vs stringprep
Next by thread: [idn] stringprep comment 6: casefold and then noramlization is not enough
Index(es):
- Date
- Thread