[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IRIs ought to use internationalized *host* names



Thanks, Keld. For those interested, there is an updated version of the
recommended characters for programming language identifiers on:

http://www.unicode.org/Public/3.2-Update/DerivedCoreProperties-3.2.0.t
xt

The derivation is listed in comments in that file:

# Derived Property: ID_Start
#  Characters that can start an identifier.
#  Generated from Lu+Ll+Lt+Lm+Lo+Nl

# Derived Property: ID_Continue
#  Characters that can continue an identifier.
#  Generated from: ID_Start + Mn+Mc+Nd+Pc
#  NOTE: Cf characters should be filtered out.

The meaning of the abbreviations is on:

http://www.unicode.org/Public/3.2-Update/UnicodeData-3.2.0.html#Genera
l%20Category

(However, it was long ago agreed that IDN should be more inclusive,
rather than than more strictly follow programming language
restrictions (see the archives). Of course, it is in keeping with the
tenor of this group to endlessly revisit old issues...)

Mark

—————

Γνῶθι σαυτόν — Θαλῆς
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

----- Original Message -----
From: "Keld Jørn Simonsen" <keld@dkuug.dk>
To: "IETF idn working group" <idn@ops.ietf.org>
Sent: Wednesday, March 27, 2002 01:28
Subject: Re: [idn] IRIs ought to use internationalized *host* names


> On Wed, Mar 27, 2002 at 03:30:59AM +0000, Adam M. Costello wrote:
> > James Seng/Personal <jseng@pobox.org.sg> wrote:
> >
> > Which characters should be allowed in internationalized host
labels?
> > This is an interesting question in its own right, and it's
possible that
> > the IESG will demand an answer.
>
> That is indeed a good question. The same question has been put
forward
> for programming languages that can use full ISO 10646 (Unicode) in
> their variable names, and ISO has developed a list of characters
> that then are recommended for use in identifiers. This generally
> consist of letters, digits and characters that can be used in
> normal words in any language of the world. The specification is
> available in annex A of ISO/IEC TR 10176. Leading Unicode people
> participated in creating this list, which is freely available.
> Get it via http://www.dkuug.dk/jtc1/sc22/wg20
>
> Kind regards
> Keld
>
>