[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] case folding



At 00:50 12/06/00 , Jonathan Rosenne wrote:
>In addition to case folding, two issues have to be considered: normalization and excluded characters.

Yes.  For normalisation, I will again use Vietnamese as an example.

In Unicode, there are pre-composed characters and also composed
characters.  For many Vietnamese letters, a pre-composed version
exists (e.g. "o^`") but one could also compose the same letter
("o" + "^" + "`").   I would propose that normalisation cause
any letter that has a pre-composed form to be put into the
pre-composed form for comparison purposes.  If there is no
pre-composed form for a letter, but there might be (hypothetically) 
multiple ways of composing that letter, then there needs to
be normalisation to a single form for a given letter prior to
comparison for DNS purposes.

>For Hebrew, we definitely should exclude the bidi formatting codes,
>cantillation marks (0591-05AF) and some points (05BD, 05BE, 05C0, 05C3,
>05C4). Regarding the points (05B0-05C2), since it seems that case folding is
>agreed for Latin scripts it seems to me logical to exclude points, but if
>not some normalization would be needed to make it work.

Interesting.  I am afraid that I'm not a user of Hebrew.

It would be very helpful if you could check how the UNICODE TR 
handles those aspects of Hebrew and comment back to the list. 
If there is an open issue,  then I think there is still time 
to feed that concern into the UNICODE folks -- so it can get resolved 
in their document (which we would then just reference).

Ran
rja@inet.org