[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] Fwd: Re: Rationale wanted for Unicode identifier rules



Title: RE: [idn] Fwd: Re: Rationale wanted for Unicode identifier rules

Yes.  This is closely related to the ISO/IEC 10176 revised
Annex A list.  But the latter excludes 10646 "level 3"
characters, and includes only "level 2" ones.  Format control
characters and most (not all) compatibility decomposable
characters are also not included in the 10176 list.  I don't
recall the exact correspondence, but I'm sure Kenneth
Whistler does...  Note also, that depending on programming
language (or other context) additional characters may be
allowed, like _ and - (HYPHEN-MINUS, and other dashes).

                Kind regards
                /kent k


> -----Original Message-----
> From: Harald Tveit Alvestrand [mailto:Harald@Alvestrand.no]
> Sent: Thursday, March 02, 2000 11:52 AM
> To: idn@ops.ietf.org
> Subject: [idn] Fwd: Re: Rationale wanted for Unicode identifier rules
>
>
> Is this something we can use (possibly modified) in IDN to
> describe what a
> reasonable character set for IDN labels is?
>
>                 Harald
>
>
>
> >X-UML-Sequence: 12492 (2000-03-01 21:35:45 GMT)
> >From: Kenneth Whistler <kenw@sybase.com>
> >To: "Unicode List" <unicode@unicode.org>
> >Cc: unicode@unicode.org, kenw@sybase.com
> >Date: Wed, 1 Mar 2000 13:35:44 -0800 (PST)
> >Subject: Re: Rationale wanted for Unicode identifier rules
> >
> >John Cowan asked:
> >
> > >
> > > Kenneth Whistler wrote:
> > >
> > > >   A. Identifier syntax along the lines described in Unicode 3.0.
> > >
> > > Can you (or someone) supply a precis of this to the poor fellow
> > > who still hasn't heard from his bookstore's order department?
> > > Especially if it is indeed simpler than the Unicode 2.0 version?
> >
> >Sure. For those of you who already have the hymnal, turn to
> page 134 to
> >sing along.
> >
> ><identifier> ::= <identifier_start> (<identifier_start> |
> ><identifier_extend>)*
> >
> ><identifier_start> is defined by an equivalent category set
> consisting of
> >        all those characters with the General Category values:
> >        Lu, Ll, Lt, Lm, Lo, Nl
> >
> ><identifier_extend> is defined by an equivalent category set
> consisting of
> >        all those characters with the General Category values:
> >        Mn, Mc, Nd, Pc, Cf
> >
> >Thus, identifiers can start with any "letter" or "letter number".
> >
> >Identifiers can continue with any "letter" or "letter
> number", any combining
> >mark (except the symbolic surrounds), any decimal digit, any
> connecting
> >punctuation, or any format control character (e.g. the invisible bidi
> >layout controls, ZWJ, ZWNJ, etc.).
> >
> >Note that this definition explicitly excludes the following
> General Category
> >values from identifiers:
> >
> >    Me, No, Zs, Zl, Zp, Cc, Pd, Ps, Pe, Pi, Pf, Po, Sm, Sc, Sk, So
> >
> >i.e. enclosing combining marks, "other numerals", all spaces, control
> >characters, all other punctuation, and all "symbols".
> >
> >--Ken
>
> --
> Harald Tveit Alvestrand, EDB Maxware, Norway
> Harald.Alvestrand@edb.maxware.no
>
>