[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] case folding



Dan Oscarsson wrote:
> I think we must have case insensitivity for alla characters were case exist.
> They are not very many in UCS. And the major difficulties I have seen
> is the Turkish I and German double S. To get Turkish I to map to dotless
> lower case i is not possible today as the current ASCII I is defined to
> lower case to i. If UCS had a separate code point for Turkish I it would
> be possible. I think for case insensitivity we can fairely easy define
> how that is done for UCS and if will be ok for most people.
> I think we should be able to define this quickly.

I suppose squeezing Turkish I into Unicode is not possible :-)
(The argument that it looks like Latin I should take a look at U+0410)
 
> But there is another area we are forgetting. For many non latin alphabets
> there is no case on a letter, but they have different forms (like
> half width, double width, final,...) that for them should compare as
> equivalent just like case should compare insensitive for us latin
> alphabet users. As I do not use these languages I do not know if
> it is difficult to define equivalence matching rules for them, but
> this may be a difficult area to define (but must be done before for
> example Arabic or Chinese names can be used).

Correct.

I suppose we already have consensus that *some* canonicalization has to be
done. Case-sensitivity does not work in domain names or any naming system.

-James Seng