[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Unicode tagging



At 11:21 16/08/00, Edmon wrote:
>>
>> I think the basic idea is that labels should not be significantly
>> more restricted in one language than another just because the on-the-wire
>> representation of certain languages might take up more space than others.
>>
>> But I don't know how to define a length limit that applies across all
>> languages and is fair to all of them.  Maybe we just need to make sure
>> that the on-the-wire representation of labels is large enough to
>> accomodate reasonably-long labels in any language.
>
>Wouldnt using uniform byte-length characters largely solve the problem?...
>ie. no transformation.

No.  Again, consider Vietnamese Quoc Ngu as an example.  Vietnamese
is a Romanised language and does not ever use ideograms. Quoc Ngu has
existed since Alexandre du Rhodes in the late 1600s and has long
been the only written form in use.

Depending on canonicalisation rules and other items not clear yet, 
a single written Vietnamese "letter" can consist of 3 Unicode 
characters.  For example, consider the vowel consisting of the 
letter "o" with a Vietnamese "hook" and a tone mark (e.g. accent 
grave "`").

I'd bet a Dim Sum lunch that there are other languages with similar 
issues in Unicode/ISO-10646.

The bottom line is that a hard limit does not appear reasonable
to define and implement -- at least not in a manner that is fair 
to all language groups and fairness was the objective of having
such a hard limit.

Yours,

Ran
rja@inet.org