[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Unicode tagging
- To: "Edmon" <edmon@neteka.com>
- Subject: Re: [idn] Unicode tagging
- From: RJ Atkinson <rja@inet.org>
- Date: Wed, 16 Aug 2000 11:47:14 -0400
- Cc: <idn@ops.ietf.org>
- Delivery-date: Wed, 16 Aug 2000 08:52:32 -0700
- Envelope-to: idn-data@psg.com
At 11:21 16/08/00, Edmon wrote:
>>
>> I think the basic idea is that labels should not be significantly
>> more restricted in one language than another just because the on-the-wire
>> representation of certain languages might take up more space than others.
>>
>> But I don't know how to define a length limit that applies across all
>> languages and is fair to all of them. Maybe we just need to make sure
>> that the on-the-wire representation of labels is large enough to
>> accomodate reasonably-long labels in any language.
>
>Wouldnt using uniform byte-length characters largely solve the problem?...
>ie. no transformation.
No. Again, consider Vietnamese Quoc Ngu as an example. Vietnamese
is a Romanised language and does not ever use ideograms. Quoc Ngu has
existed since Alexandre du Rhodes in the late 1600s and has long
been the only written form in use.
Depending on canonicalisation rules and other items not clear yet,
a single written Vietnamese "letter" can consist of 3 Unicode
characters. For example, consider the vowel consisting of the
letter "o" with a Vietnamese "hook" and a tone mark (e.g. accent
grave "`").
I'd bet a Dim Sum lunch that there are other languages with similar
issues in Unicode/ISO-10646.
The bottom line is that a hard limit does not appear reasonable
to define and implement -- at least not in a manner that is fair
to all language groups and fairness was the objective of having
such a hard limit.
Yours,
Ran
rja@inet.org