[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IRIs ought to use internationalized *host* names




----- Original Message ----- 
From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>
 > The Unicode character database classifies each character as belonging to
> exactly one of the following broad classes:
> 
> L: letter
> M: mark
> N: number
> P: punctuation
> S: symbol
> Z: separator
> C: other

May I add this?

  U: unassigned code points.

> 
> We can start by examining which of these classes of ASCII characters are
> allowed in ASCII host labels.
> 
> L: 52 exist, all are allowed
> M:  0 exist
> N: 10 exist, all are allowed
> P: 23 exist, only hyphen-minus is allowed
> S:  9 exist, none are allowed
> Z:  1 exists, it is not allowed
> C: 33 exist, none are allowed

  U: indefinite, all are allowed .


> 
> We can trivially extend these results to form a simple rule covering the
> entire Unicode repertoire, except that we have no precedent for class
> M.  Since characters in class M tend to be things like diacritics, they
> should be allowed.  So the proposed rule is:
> 
> All characters in classes L (letter), M (mark), and N (number) are
> allowed, and U+002D (hyphen-minus) is also allowed.  Everything else is
> forbidden.
 
 U should be also allowed in addition to L,M,N.
 But in later version of unicode , U may be partitioned into L' ~ C' and smaller U'.

 Soobok Lee