[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Comments on protocol drafts

To: "Martin J. Duerst" <duerst@w3.org>
Subject: Re: [idn] Comments on protocol drafts
From: RJ Atkinson <rja@inet.org>
Date: Mon, 14 Feb 2000 17:47:31 +0000
Cc: idn@ops.ietf.org
Delivery-date: Mon, 14 Feb 2000 14:46:23 -0800
Envelope-to: idn-data@psg.com

At 00:32 08-02-00 , Martin J. Duerst wrote:

>Please don't use CJK as the main example. They use two bytes
>all the time anyway, so using 3 (UTF-8) or 4 (UTF-5) or so
>isn't that a big hit. And label lengths, in terms of characters,
>are going to be much smaller for CJK than for alphabetic
>scripts. The main problem cases are scripts such as Devanagari,
>Bengali, Tamil, Georgian,... which are alphabetic but require
>3 bytes in UTF-8. 

Please consider Vietnamese as another case:
         - official form ("Quoc Ngu") is Romanised
         - common form (== official form) is Romanised
         - Romanised form has been used for centuries,
           while older form has been dead (for non historical uses) for centuries
         - Could fit into 8 bits by itself, but UTF-8 requires much more space

For more reading on Vietnamese, see RFC-1456, which defines a widely
used (e.g. VIQR is common in the Vietnamese culture group on USENET)
quoted-readable encoding for Vietnamese as well as an 8-bit character
set encoding.

Ran
rja@inet.org

Prev by Date: Re: [idn] Comments on protocol drafts
Next by Date: Re: [idn] Comments on protocol drafts
Prev by thread: Re: [idn] Comments on protocol drafts
Next by thread: Re: [idn] Comments on protocol drafts
Index(es):
- Date
- Thread