[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] URL encoding in html page



Bruce Thomson <bthomson@fm-net.ne.jp> wrote:

> But to conserve file space, it would probably be best to allow
> intermixing of 128-bit characters with ASCI text. UTF-8 continues
> to be the way to do this, since it just a compression scheme that
> does not really depend on the fact that Unicode is currently
> limited to 32 bits. It could just as easily be extended to work
> with much larger character sets.

This is not even close to true.  UTF-8 is very much dependent on the
32-bit architecture of Unicode, and in fact is constrained to 31-bit
code points.  A quick check of the "10xxxxxx 10xxxxxx..." chart in RFC
2279, or in the Unicode Standard or ISO/IEC 10646, will confirm that.

And the word "currently," as used to refer to either the 21-bit or the
32-bit limit of Unicode/10646, is being used way too cavalierly.
Unicode is not going to be expanded beyond U+10FFFD, and nobody can
think of a non-whimsical reason why it should be.

-Doug Ewell
 Fullerton, California