[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] question about cidnuc

To: Paul Hoffman / IMC <phoffman@imc.org>
Subject: Re: [idn] question about cidnuc
From: idn <idn@asadal.cs.pusan.ac.kr>
Date: Sat, 11 Mar 2000 11:08:54 +0900 (KST)
cc: idn@ops.ietf.org
Delivery-date: Fri, 10 Mar 2000 18:14:51 -0800
Envelope-to: idn-data@psg.com

Dear Mr. Hoffman:

thanks very much for your quick reply.
now i can understand how the compression works.

On Fri, 10 Mar 2000, Paul Hoffman / IMC wrote:

> >i made up two examples for the first two cases.
> >are they correct?
> >
> >   1) no compression: 0x0061 1100 1162
> >   2) compressed/one-octet header : 0x1100 1162 -> 0x 11 00 62
                              ^^^^^^ i meant 'mode'.  the same below.

> >   3) compressed/two-octet header:  examples???
> 
> In cidnuc, section 2.4.1, Step 1 says 
> that all the upper octets *must* match in order to use the greater 
> compression. In the case above, 0x00 does not match 0x11. Thus, the output 
> of the compression step is 0xD8006111001162.

  Since your doc. describes two modes for compressed string and
one case without compression (actually one octet explansion),
I thought there would be three cases.
  I would say that your document could be improved for easier reading.
You could also include simple examples, which will help readers a lot.
  Just my thought.

> The purpose is for long strings that might hit the 63-character 
> limit after encoding with Base64. The script you gave, Hangul Jamo, is a 
> prime example of where cidnuc's compression helps. 

Below, I got somewhat confused ...
Sorry if this was discussed before.
(As I said, I am quite new to this mailing list
and I am trying to catch up ASAP...)

> In downcasing UTF8, 

1. ?? I know UTF-8, but not "downcasing" UTF-8.
Could you please give me some reference for downcasing.

----

> In downcasing UTF8, the limit for Hangul Jamo is 8 characters;

2. for normal UTF8, one Hangul Jamo (two octects) wil become three octets.
Therefore, the limit seems 63 / 3 = 21.  Am I right?

----

> in UTF-5, it is 15 characters; 

3.  I guess I can figure it out.
In UTF-5, one Hangul jamo (two octects) will become four octets.  
Therefore, the limits seems floor (63/4)=15.  Am I right?

> in cidnuc, it is 37 characters.

4.  I am somewhat confused here.  
In case of one-octet header, the limit seems 36 chars, not 37 chars.
Please corret me if I am wrong.  My calculation is shown below:

  1) let's assume we have 37 jamos (=74 octets). 
  2) after compression, we have 38 octets (due to 0x11 header).
  3) after base32 encoding, we have 
    ceiling (38*8/5) = ceiling (60.8) = 61 octets.
  4) after prepending "wg4", we have 64 octets, which exceeds 63 by one.

Therefore, the limit seems 36 chars.

----

5. In the document, it is said that
  "the two-octet mode limits the number of chars to 17".

I am somewhat confused here too.  The limit seems 18, not 17.
Please corret me if I am wrong.  My calculation is shown below:

  1) let's assume we have 18 chars (=36 octets). 
  2) after compression, we have 37 octets (due to 0xd8 header).
  3) after base32 encoding, we have 
    ceiling (37*8/5) = ceiling (59.2) = 60 octets.
  4) after prepending "wg4", we have 63 octets.

Therefore, the limit seems 18 chars.

> --Paul Hoffman, Director
> --Internet Mail Consortium

Thanks very much.

김 경석, 부산대 정보 컴퓨터 공학부; 
KIM Kyongsok/GIM Gyeongseog, Busan National Univ.
gimgs@hangeul.cs.pusan.ac.kr, http://hangeul.cs.pusan.ac.kr/hangeul/
Ph: +82-(0)51-510-2292, Fax: +82-(0)51-515-2208

Prev by Date: Re: [idn] question about cidnuc
Next by Date: Re: [idn] question about cidnuc
Prev by thread: Re: [idn] question about cidnuc
Next by thread: Re: [idn] question about cidnuc
Index(es):
- Date
- Thread