[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Change request for cidnuc



James, or Marc:

Are we now discussing implementations, or are we still focused on agreeing
to the final presentation in Adelaide of the requirements? this reads like
an implementation discussion, which I thought was not appropriate for this
WG, so I'm trying to get my bearings...

Thanks,

Bill Semich
.NU Domain

At 12:50 PM 3/13/00 +0000, Aaron Irvine wrote:
>Hello,
>
>Please consider the following suggestions for improvement to CIDNUC.
>
>--
>
>Rather than "wg4", I suggest the more distinctive "--" preceded by a single
>letter "a" to "z".  Currently "a" to "c" to be used and indicate which
form of
>CIDNUC.  To allow future proofing, letters "d" to "z" are reserved for
>potential later use.
>
>--
>
>Currently CIDNUC has considered compact encoding for Asian and for scripts
like
>Cyrillic and Greek.  However, for accented Latin the compression is poor.
This
>change request addresses the problem (and allows Latin labels to be up to
>(63-3)/2 = 30 letters long).
>
>--
>
>A label or username can be encoded in one of four ways.  Considering the two
>octets of the string in UTF-16 and using the notation that L10 is the
lowest 10
>bits and L8 is the lowest 8 bits and H8 is the highest 8 bits:
>
>1) if string is only a-z 0-9 and hyphen then no encoding applied
>
>2) else if all high octets are 0x01 0x02 or 0x03 (e.g. string is Latin
>supplement/extended-A/etc), then encode as follows:
> "c--" base32(L10 L10 L10 ...)
>
>3) else if all high octets are equal (e.g. string Greek/Cyrillic/etc), then
>encode as follows:
> "b--" base32(H8 L8 L8 L8 ...)
>
>4) else (e.g. Asian/etc), encode as follows:
> "a--" base32(H8 L8 H8 L8 H8 L8 ...)
>
>--
>
>                    Base32 conversion
>        bits   char  hex         bits   char  hex
>        00000   9    0x61        10000   p    0x71
>        00001   a    0x62        10001   q    0x72
>        00010   b    0x63        10010   r    0x73
>        00011   c    0x64        10011   s    0x74
>        00100   d    0x65        10100   t    0x75
>        00101   e    0x66        10101   u    0x76
>        00110   f    0x67        10110   v    0x77
>        00111   g    0x68        10111   w    0x78
>        01000   h    0x69        11000   x    0x79
>        01001   i    0x6a        11001   y    0x7a
>        01010   j    0x6b        11010   z    0x32
>        01011   k    0x6c        11011   2    0x33
>        01100   l    0x6d        11100   3    0x34
>        01101   m    0x6e        11101   4    0x35
>        01110   n    0x6f        11110   5    0x36
>        01111   o    0x70        11111   6    0x37
>
>(0 and 1 never to be used.  7 and 8 and - reserved for possible future use.)
>
>--
>
>Example for encoding 2:
>
>d{"u}rst@w3.org
>
>c--cdg3crcsct@w3.org
>
>(or it could equally be written: c--cDg3cRcScT@w3.org)
>
>
>Another example for encoding 2:
>
>www.tre-feli{^c}a.ie
>
>www.c--ctcrceamcfceclcihica.ie
>
>
>
>--
>
>regards,
>Aaron Irvine
>
>--
>
>-----------------------------------------------------
>Aaron Irvine
>  mailto:airvine@corp.phone.com
>-----------------------------------------------------
>
>
>
>
Bill Semich
President and Founder
.NU Domain Ltd
http://whats.nu
bill@mail.nic.nu