[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] question about cidnuc



Dear Mr. Hoffman:

thanks very much for your quick reply.
now i can understand how the compression works.

On Fri, 10 Mar 2000, Paul Hoffman / IMC wrote:

> >i made up two examples for the first two cases.
> >are they correct?
> >
> >   1) no compression: 0x0061 1100 1162
> >   2) compressed/one-octet header : 0x1100 1162 -> 0x 11 00 62
                              ^^^^^^ i meant 'mode'.  the same below.

> >   3) compressed/two-octet header:  examples???
> 
> In cidnuc, section 2.4.1, Step 1 says 
> that all the upper octets *must* match in order to use the greater 
> compression. In the case above, 0x00 does not match 0x11. Thus, the output 
> of the compression step is 0xD8006111001162.

  Since your doc. describes two modes for compressed string and
one case without compression (actually one octet explansion),
I thought there would be three cases.
  I would say that your document could be improved for easier reading.
You could also include simple examples, which will help readers a lot.
  Just my thought.
 
> The purpose is for long strings that might hit the 63-character 
> limit after encoding with Base64. The script you gave, Hangul Jamo, is a 
> prime example of where cidnuc's compression helps. 

Below, I got somewhat confused ...
Sorry if this was discussed before.
(As I said, I am quite new to this mailing list
and I am trying to catch up ASAP...)

> In downcasing UTF8, 

1. ?? I know UTF-8, but not "downcasing" UTF-8.
Could you please give me some reference for downcasing.

----

> In downcasing UTF8, the limit for Hangul Jamo is 8 characters;

2. for normal UTF8, one Hangul Jamo (two octects) wil become three octets.
Therefore, the limit seems 63 / 3 = 21.  Am I right?

----

> in UTF-5, it is 15 characters; 

3.  I guess I can figure it out.
In UTF-5, one Hangul jamo (two octects) will become four octets.  
Therefore, the limits seems floor (63/4)=15.  Am I right?

> in cidnuc, it is 37 characters.

4.  I am somewhat confused here.  
In case of one-octet header, the limit seems 36 chars, not 37 chars.
Please corret me if I am wrong.  My calculation is shown below:

  1) let's assume we have 37 jamos (=74 octets). 
  2) after compression, we have 38 octets (due to 0x11 header).
  3) after base32 encoding, we have 
    ceiling (38*8/5) = ceiling (60.8) = 61 octets.
  4) after prepending "wg4", we have 64 octets, which exceeds 63 by one.

Therefore, the limit seems 36 chars.

----

5. In the document, it is said that
  "the two-octet mode limits the number of chars to 17".

I am somewhat confused here too.  The limit seems 18, not 17.
Please corret me if I am wrong.  My calculation is shown below:

  1) let's assume we have 18 chars (=36 octets). 
  2) after compression, we have 37 octets (due to 0xd8 header).
  3) after base32 encoding, we have 
    ceiling (37*8/5) = ceiling (59.2) = 60 octets.
  4) after prepending "wg4", we have 63 octets.

Therefore, the limit seems 18 chars.
 
> --Paul Hoffman, Director
> --Internet Mail Consortium

Thanks very much.


±è °æ¼®, ºÎ»ê´ë Á¤º¸ ÄÄÇ»ÅÍ °øÇкÎ; 
KIM Kyongsok/GIM Gyeongseog, Busan National Univ.
gimgs@hangeul.cs.pusan.ac.kr, http://hangeul.cs.pusan.ac.kr/hangeul/
Ph: +82-(0)51-510-2292, Fax: +82-(0)51-515-2208