[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thread on - Re: [idn] Prohibit CDN code points



Dear Kenneth  & all:
         Thanks your response to these questions.
----- Original Message -----
From: "Kenneth Whistler" <kenw@sybase.com>

> The string "U+006A" is a denotation for the Unicode code point (in
> the overall range of possible values 0..10FFFF), as well as the
> character encoded at that code point, namely LATIN SMALL LETTER J.
>
              You point out another question  I want to know,  the data type
and range of UNICODE point that toASCII function module will be inputed.
Because IDNA may be a common module for all applications,  but  today there
are not only one vender like NSI/ISC as before , so it should define and
describe more clearly.  As the traditional habit in protocol of IETF RFC ,
the parameter are passed by ASCII string,  but if it is ASCII string , the
delimiter of each character and indicater MUST be very clear. Especially ,
there will be some pure ASCII string name may also passed to it.

> The case doesn't matter, although the Unicode Standard most
> often uses uppercase. So some people would also use "U+006a" or
> "u+006a" for the same Unicode code point.
>
> >
> > My question are:
> > Q1:   U+hhhh  can be represented as u+hhhh  or not ?
>
> Yes. And you can also just leave off
> the U+ altogether where it is clear you are referring to
> Unicode characters, i.e. "hhhh", so for the LATIN SMALL LETTER J,
> just "006A" or "006a".
>
> > Q2:   Here U+HHHH  is not a hostname , does it MUST be forced to lower
> > u+hhhh or not  in nameprep ?
>
> I think you are mixing things up. If you put a Unicode character
> into a hostname, you don't literally put the string "U+006A" (or
> whatever) into the hostname, you put the Unicode encoded representation,
> in whatever form of Unicode you are using, into the hostname.
>
> Thus, if my hostname was "jam", in Unicode UTF-8, that would be
> just 0x6A 0x61 0x6D, since the Unicode values for ASCII characters
> like "j" are the same as ASCII in UTF-8.
>
> If my hostname was the Chinese word for 'banana', just to pick
> a random example, that consists of two characters (pinyin: xiang1jiao1).
> The Unicode values for those characters are U+9999 U+8549. If you
> have a Unicode string, that would just be two 16-bit numbers,
> 0x9999 followed by 0x8549, if using Unicode UTF-16, or the
> following byte sequence if using Unicode UTF-8: 0xE9 0xA6 0x99
> 0xE8 0x95 0x89.
>
       Sorry,  in a function module , like such inportant one " toASCII" ,
the input parameter of  type , size range and representing format should be
defined more clearily to support inter-operability.
> > Q3:  Puny code  draft  accept  U+hhhh  or  u+hhhh  to let the final
encoded
> > ASCII character (last character of corresponding  encoded code point)
with
> > case upper or lower.
>
> If I am interpreting things correctly, Punycode is defined on the
> Unicode code points, and certainly not on
> the short identifier strings for the Unicode code points.
> So for the Chinese 'banana' example, you'd be encoding two
> code point integers (39321 = 0x9999, followed by 34121 = 0x8549),
> *not* the string of integers corresponding to the ASCII
> string "U+9999U+8549".
>
          You just point out the problem ,  how to differentiate them ?
IDNA  module is a common module shared by most applications or a full set
belong to each AP  independetly is not the same results in these
descriptions .  Does some one can clarify it more clearly ?

> Of course, somebody might want to try having a hostname or
> domain name of "U+9999U+8549", but that is a 12 character ASCII
> string, and is not the same thing at all as the two-character
> Unicode string for the Chinese word for 'banana'.
>
          It is different , but you can not avoid people to register such
name .  If it is a common module to all , it should consider the influence
from such input.

Thanks to all the reply
L.M.Tseng