[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] CIDNUC in action



At 10:56 PM 3/23/00 +0800, James Seng wrote:
>1. Implementation of CIDNUC while not very complicated is *very* time
>    consuming and to debug! (Argghh!!, I have to spend my time going thru
>    bit by bit to get it right)

Quite true. Bit-twiddling is always error-prone.

>2. Its compression is useful but unfortunately has not much effect on CJK.
>    In fact, it makes CJK even longer *doh*.

Longer than what? It makes it shorter than UTF-5.

>  It is useful for many languages
>    which does not use more than 256 code point.

Which is every script in 10646 other than Han, Yi, and Hangul syllables. 
This WG should decide how important it is to have name parts for these 
scripts be longer than 14 characters (UTF-5), 17 characters (CIDNUC), or 8 
characters (8&down). Non-BMP characters have longer encodings in all three 
proposals.

>(Would be nice if we have a
>    generic LZW or Huffman compression or UTR#6?)

If you can design one that does not overly-restrict many scripts, that 
would be great!

>3. No explaination on what do encoding or decoding algorthm should do when
>    it encounter an invalid character.

Sure it does; see sections 2.3.2, 2.3.3, and 2.3.4.

>4. www.t[..jp (SJIS www.yahoo.co.jp) in cidnuc will be
>    www.aq8gdsnl7a.aq83bhru6j6.jp, leaving www and jp intact. :)
>
>    Still dont really like the aq8, without it would a bit shorter.

Agree. However, all proposals need a way for the ASCII-encoded name to be 
able to be differentiated from non-IDN names. Otherwise, there will be 
errors in trying to decode a non-IDN name from the ASCII encoding. Dan and 
I have been talking about this, and both the methods in cidnuc and 8&down 
have positive and negative attributes. When (if?) the WG is ready to start 
picking a single proposal, we'll need to revisit this. Fortunately, all of 
the proposals can use any of the proposed tagging mechanisms.


--Paul Hoffman, Director
--Internet Mail Consortium