[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Comments on protocol drafts
- To: "Martin J. Duerst" <duerst@w3.org>
- Subject: Re: [idn] Comments on protocol drafts
- From: RJ Atkinson <rja@inet.org>
- Date: Mon, 14 Feb 2000 17:47:31 +0000
- Cc: idn@ops.ietf.org
- Delivery-date: Mon, 14 Feb 2000 14:46:23 -0800
- Envelope-to: idn-data@psg.com
At 00:32 08-02-00 , Martin J. Duerst wrote:
>Please don't use CJK as the main example. They use two bytes
>all the time anyway, so using 3 (UTF-8) or 4 (UTF-5) or so
>isn't that a big hit. And label lengths, in terms of characters,
>are going to be much smaller for CJK than for alphabetic
>scripts. The main problem cases are scripts such as Devanagari,
>Bengali, Tamil, Georgian,... which are alphabetic but require
>3 bytes in UTF-8.
Please consider Vietnamese as another case:
- official form ("Quoc Ngu") is Romanised
- common form (== official form) is Romanised
- Romanised form has been used for centuries,
while older form has been dead (for non historical uses) for centuries
- Could fit into 8 bits by itself, but UTF-8 requires much more space
For more reading on Vietnamese, see RFC-1456, which defines a widely
used (e.g. VIQR is common in the Vietnamese culture group on USENET)
quoted-readable encoding for Vietnamese as well as an 8-bit character
set encoding.
Ran
rja@inet.org