[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IRIs ought to use internationalized *host* names



This message contains responses to both James Seng and Soobok Lee.

James Seng <jseng@pobox.org.sg> wrote:

> My believe is what is allowed in host labels is a topic for the zone
> administrator to decide. .CN have a different set compared to .SG
> compared to .COM compared to say IBM.COM.

Zone administrators can always impose their own restrictions, but that
still leaves us with the question of what the IRI spec should say about
what characters are allowed in the host field of IRIs.

The historic precedent is that ASCII punctuation and symbols are allowed
in ASCII *domain* names, but not in ASCII *host* names, and not in the
host field of URIs.

Should IRIs be more loose and allow non-ASCII punctuation and symbols
in the host field (while continuing to disallow ASCII punctutation
and symbols)?  Or should IRIs try to apply an old tradition to a new
situtation, and disallow punctuation and symbols?

Soobok Lee <lsb@postel.co.kr> wrote:

> > L: letter
> > M: mark
> > N: number
> > P: punctuation
> > S: symbol
> > Z: separator
> > C: other
> 
> May I add this?
> 
>   U: unassigned code points.

I see your motivation.  The classes I listed are all the ones mentioned
in the Unicode character database, but of course the database covers
only assigned code points.  All code points not mentioned in the
database are unassigned, and we could view that as another class.

>  U should be also allowed in addition to L,M,N.

ToASCII and Nameprep already take an input flag indicating whether
unassigned code points are to be allowed or prohibited.  My proposal
wouldn't change that.

AMC