[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] case folding



If we were to discuss this and since we will be basing it on Unicode, it would
be less confusion if people can refer the glyphs to be allowed or disallowed
using Unicode Categories. 

Abbr  Description                    Proposal
Lu    Letter, Uppercase              Allow
Ll    Letter, Lowercase              Allow
Lt    Letter, Titlecase              Allow
Mn    Mark, Non-Spacing              Disallow
Mc    Mark, Spacing Combining        Disallow
Me    Mark, Enclosing                Disallow
Nd    Number, Decimal Digit          Allow
Nl    Number, Letter                 Disallow, should we remap to Nd?
No    Number, Other                  Disallow, should we remap to Nd?
Zs    Separator, Space               Disallow
Zl    Separator, Line                Disallow
Zp    Separator, Paragraph           Disallow
Cc    Other, Control                 Disallow
Cf    Other, Format                  Disallow
Cs    Other, Surrogated              Disallow
Co    Other, Private Use             Disallow (sure?)
Cn    Other, No Assigned             Disallow

Lm    Letter, Modifier               Allow
Lo    Letter, Other                  Allow
Pc    Punctuation, Connector         Disallow
Pd    Punctuation, Dash              Disallow (except '-')
Ps    Punctuation, Open              Disallow
Pe    Punctuation, Close             Disallow
Pi    Punctuation, Initial quote     Disallow
Pf    Punctuation, Final quote       Disallow
Po    Punctuation, Other             Disallow
Sm    Symbol, Math                   Disallow
Sc    Symbol, Currency               Disallow
Sk    Symbol, Modifier               Disallow
So    Symbol, Other                  Disallow

Flame away :-)

-James Seng

"Eugene M. Kim" wrote:
> 
> On Mon, 12 Jun 2000, James Seng wrote:
> 
> | However, should we reopen the discussion on what codepoint is allowed and what
> | is not? I remember we have quite a heated argument and the consensus then was
> | to leave it in the proposal protocol. Any changes now?
> 
> Yes and no.  As people start to write the actual proposals, it seems a
> good timing to reopen those issues (so that proposal writers can benefit
> from the discussion) now, but none of the codepoints should be mandated
> in the requirements document (except for obvious ones such as alphabet
> characters and digits).
> 
> Maybe some other codepoints make almost no sense to use in domain names
> as well; they include C0/C1 control codes and private use area
> (U+E000-U+F8FF) at least.  And there are a lot of `controversial'
> letters such as arrow symbols and other pictographic characters.
> 
> However, IMHO even though we are ever to define some codepoints to be
> excluded in requirements, they should not include any characters that
> can have some influence on a particular language or script, except when
> there is a strong discouraging reason to drop them.
> 
> Eugene Kim
> 
> --
> Eugene M. Kim <ab@astralblue.com>
> 
> "Is your music unpopular?  Make it popular; make music
> which people like, or make people who like your music."