[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] nameprep forbidden characters



James Seng wrote:

> Kenneth Whistler wrote:
> > Technically, the two approaches are identical, since the repertoire is
> > defined against Unicode 3.0 and not yet assigned characters are
> > forbidden.
> 
> While the two approach is the same, common implementation will take the
> current I-D as "allow unless specify otherwise". I am not sure what will
> happened when we have no script in later version.
> 
> Hence, I am more in favour with "disallow unless specify otherwise". (This is
> the third time i am hammering this at Paul :-)

I agree that the I-D should be clear that currently unassigned
code points are disallowed (even in future versions of Unicode where
they may become assigned) -- since there is no telling what may go in
a particular code position, and therefore no telling whether it would cause
a problem is someone just added it willy-nilly to their implementation
of IDN when the character itself becomes assigned in the future.

However, while that should be perfectly clear, and no exceptions should
be allowed without explicit updating of the I-D (or whatever document
it turns into), the "disallow unless specify otherwise" approach is
less clear in terms of justifying and explaining why characters need
to be omitted from the allowable set as part of name preparation. So
what Paul has done is primarily an aid to thinking about the problem.

A necessary implementation aid, in any case -- no matter how the
explanation is structured -- will be a machine-readable file that
specifies unambiguously for all Unicode 3.0 characters whether each
is excluded or not. That is the only reliable way to guarantee that
implementations will get the right results. If implementers are reading
long lists of disallowed characters (or, alternatively, long lists
of allowed characters) they are bound to make mistakes in their
implementations.

--Ken