[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] My draft for internationalisation of DNS



At 16:13 00/02/06 -0800, Dan wrote:

> Hi
> 
> As I was requested in one of the replys to my comments on this list,
> to write an internet draft, I have tried to do that.
> It is my first try at writing an internet
> draft, so I am sure there is more work to be done before it is ready.

Lots of interesting ideas.

> 2.2 Character data

>    Therefore character data MUST:
>    - Be ISO 10646 (UCS) [UCS].
>    - Be normalised using form KC as defined in Unicode technical
>      report #15 [UTR15].
>      If the character data is in a text string that is not used in
>      character matching, normalisation form C of [UTR15] may be used.
>    - Encoded using UTF-8 [RFC2279].

>    Note: Normalisation form KC results in compatible characters
>    merged into one (for example Greek A to Latin A). This results
>    in less user confusion (as the Greek A looks like Latin A and
>    many will assume it is a Latin A).

Please be careful here. Form KC does not merge Greek Alpha and Latin A,
because they are not compatibility equivalents. Form KC gets rid of
things such as the 'fi' ligature, but these may be eliminated more
easily by excluding them from the repertoire. Same for a lot of other
things.


>    Note: Case folding to lower case using UTR#21 is not perfect. For
>    example in Turkey I is lower cased into a dotless i, but UTR#21
>    does it in the old ASCII way (I -> i). This way we get a well
>    defined lower caseing that can be used in matching, but it will
>    not be correct with all languages local rules.

With your proposal for case folding, almost everything seems to work,
and the I->i problem could be dealt with by just asking the Turkish
users to only use lower case i/dotless-i. But there is a big
problem: When new characters will be added to Unicode, all the
tables on the servers have to be updated, and you have only one
bit to distinguish between old an new, and that's already used.



Regards,    Martin.



#-#-#  Martin J. Du"rst, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org