[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Unicode tagging



> As the information density per octet depends on the encoding used, I
> do not think we can have this as a measurement. And you are right
> that the number of characters is not correct either, one character in
> Chineese can be a whole word. 

I actually think it would be possible to measure this rigorously,
but I doubt it would be worth the trouble because (a) we'll never
get a representation that is entirely fair, and (b) no matter how
much you "prove" that something is fair, it doesn't satisfy someone
who believes otherwise.  So while we should try to come up with
something which is more-or-less "fair" (probably not UTF-8, BTW -
it discriminates against non-Latin-based alphabetic lanauges),
we are going to be criticized about it no matter what we do.

> But one example of being fair is that
> if 63 characters are allowed using ASCII, 63 characters must also be
> allowed using ISO 8859-1 (or other latinbased character sets).
> When we use UTF-8, the protocol will have space for more ASCII characters
> than ISO 8859-1 characters. To be fair, ASCII must still be restricted
> to 63 characters even if the protocol have space for more.
> This restriction is also very important for implementors.

agree entirely.

Keith