[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] Comments on IDNA/stringprep/nameprep




> > Going the other way around (as seems to be suggested by the
> > authors), frezes at time of first deployment which
> > punctuation/symbols can and cannot be used in future
> > syntaxes that embed domain names.  In particular, the current
> > suggestion *forbids* as special use (i.e. surrounding syntax or
> > use in special purpose (non-host) domain names) any non-ASCII
> > symbol/punctuation.
> 
> I don't understand this issue - stringprep is likely to be used for
> many things that use Unicode and need ways to test for equality etc
> and not all of these are domain names.

Yes, sorry for sayin "domain" there.

> I think (but would gladly accept to be corrected) that the IPS use of
> stringprep is intended to be for names of storage volumes (or 
> something like that).

In the scenario I suggest, "volume names", "file names", or even 
programming language identifers should be able to use a tailoring
of a common talbe (that carries most of the data).  Things like
case fold or not case fold should be made easy to express in a
tailoring (delta).


> Given that this intent exists your concern about freezing the set
> of punctuation can't possibly be something that the people working
> on non-host/domain name use of stringprep see as a problem.

"freezing the set of punctuation"?  Where did I say or imply that?
What I said was that punctuation should be disallowed by the default
(but some, selectively, punctuation may be allowed by a delta/tailoring).




> > Even though e.g. [gg] and [g][g] (there are a few hundred other examples)
> > are not canonically or compatibility equivalent, they still represent
> > the same sequence of Hangul letters, and thus "mean" the same.
> 
> Yes, same argument is used for SC/TC needing to be addressed in IDN.

No, no, no!!  This issue is comparable to the *canonical* equivalences
that already exist for Hangul syllable characters, and for other 
characters that have a canonical decomposition (some "double latin
letters" have compatibility decompositions, but the relationship here
is much stronger; and it is much much stronger than case insensitivity).
Unfortunately, due to historic events, that equivalence is no longer
recorded in Unicode 3.0 and later property data.

This is in no way comparable to the SC/TC issue which is a spelling
preference issue, where the "spellings" are actually different.
Here it is just about the underlying representation for the **same**
spelling (in terms of sequence of letters; there is not even any
case difference or font variant difference [for correctly constructed
fonts that cover Hangul]).

> > Even though not all systems display the decompostions correctly yet,
> > there is no reason to believe that that they will not be supported
> > by most rendering engines for Hangul.  Unfortunately, the normal
> > forms cannot be changed at this time, even though that would have
> > been better.  There have been many misconceptions around about,
> > e.g., where Hangul syllable break are, which may have lead
> > to the current situation.  Once those misconceptions are cleared
> > up, you will likely see more comments to the fact that [gg] 
> > and [g][g] (etc.!) are in fact equivalent from a Hangul perspective.
> 
> I still don't see why you think this adjustment/fix to 
> Unicode is in scope for IDN.

Since it can no longer be corrected by changing the normalisation
forms, just about everything else has to do the correction instead.


> > > > 9. User interfaces that encounter mixed script hostname *parts*
> > > > should be recommended to "flag" them (ballon warning, color
> > > > differentiate, make blinking, bounce automatic 
> > > registratations, ...).
> > > 
> > > By "*parts*" do you mean labels or something else?
> > 
> > stringprep: "DNS domain name parts";
> > 
> > idna: "A label is an individual part of a domain name";
> > 
> > nameprep: "This document describes how to prepare
> > 	internationalized host name parts" (I think that's the
> > 	wrong approach, e.g. it should apply to the entire name,
> > 	but I'm just quoting for the term here.)
> 
> I didn't ask what semantics others assign to "parts" - I 
> asked specifically
> what you meant in your use of the term.

The quotes motivated not only how, but why I used that term.
Except for the definition in the idna document, why don't
you complain when others use the same term without definition?

Suggestion: All the documents should include a sufficiently
complete "definitions of terms" clause, where all technical
terms are defined in a dictionary like format (not with the
term inlined into a paragraph).

		Kind regards
		/kent k


>   Erik
>