[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Unicode tagging




>> While I truly
>> believe that for the sake of the DNS, the use of a uniform byte length
>> encoding scheme is best especially considering the fact that there exists a
>> "count" in front of a label and the count could then correspond to the
>> number of characters so that it could be "fair" between languages, 
>
>there are differnt interpretations of "fair".  (my idea of "fair" 
>is that the information density per octet - *not* character density
>per octet - is about the same in all languages)   but the particular 
>representation chosen matters a lot more for storage formats, or
>the format of an email message which might be megabytes in length, 
>than in a relatively short IDN query or response.

As the information density per octet depends on the encoding used, I
do not think we can have this as a measurement. And you are right
that the number of characters is not correct either, one character in
Chineese can be a whole word. But one example of being fair is that
if 63 characters are allowed using ASCII, 63 characters must also be
allowed using ISO 8859-1 (or other latinbased character sets).
When we use UTF-8, the protocol will have space for more ASCII characters
than ISO 8859-1 characters. To be fair, ASCII must still be restricted
to 63 characters even if the protocol have space for more.
This restriction is also very important for implementors.

   Dan