[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] What's wrong with skwan-utf8?



At 16.19 -0800 00-12-25, D. J. Bernstein wrote:
>  > Also, there is a question whether UTF-8 is really what we should use.
>
>Many systems use UTF-8 internally. It takes less work for them to read
>and write UTF-8 than for them to handle text in other character sets.

UTF-8 is not a character set. It is an encoding of the Unicode/10646 
character set. I was not questioning Unicode/10646, but if the 
encoding is right. I personally feel an ACE encoding is easier for 
the short term, and a 32 bit solution for the longterm (but with a 
dictionary approach). I don't feel a system based on a weird encoding 
such as UTF-8 where we "penalize" some characters in the character 
set is the right way of going if we are to find _THE_RIGHT_ solution.

>Quite a few programs will Just Work(tm) if IDNs are defined as UTF-8,
>while they'll have to be upgraded if IDNs are defined any other way.

This is completely false because the big thing regarding IDN is 
definitly not what charset to use, or what encoding of Unicode (if it 
is UTF-8 or ACE encoded) but the need for the nameprep algorithm.

    paf