[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] Comments on IDNA/stringprep/nameprep





> -----Original Message-----
> From: Erik Nordmark [mailto:Erik.Nordmark@eng.sun.com]
...
> > 1. stringprep and nameprep should be rejoined to a
> > hostnameprep. They are only about host name preparation,
> > not any other name preparation.  Similar preparations
> > may still take advantage of the hostnameprep document,
> > by declaring "deltas", small changes that may be needed
> > for other (Internet, DNS) names.  That would likely
> > minimise the size of the "reuse" documents.
> 
> Kent,
> 
> There is work going you in other WGs to use stringprep (that 
> you might not
> be aware of).
> Is the high-order bit of this comment that you'd like nameprep
> be renamed hostnameprep?

Both the title and abstract of nameprep says that it is about
host names (or even host name parts).

However, the gist of the comment you are referring to is that:

There should be a basic approach (currently stringprep)
*and* a basic table of mappings and prohibitions, and that
deltas should be made from that.  Currently each use of stringprep
must have complete tables, and cannot reuse a common basic
table (per version of Unicode).

The idea is to have more reuse through having such a common
table.  It would make the similar (in other WGs) uses easier to
review and (once approved) use.

To make that workable, the basic table should be very restrictive
and not allow any punctuation or symbols (other than full stops,
and hyphen-like characters).  The latter can then be taken into
use as either syntax around domain names or be included in
certain other kinds of domain names than hostnames on a per
punctation/symbol basis.

Going the other way around (as seems to be suggested by the
authors), frezes at time of first deployment which
punctuation/symbols can and cannot be used in future
syntaxes that embed domain names.  In particular, the current
suggestion *forbids* as special use (i.e. surrounding syntax or
use in special purpose (non-host) domain names) any non-ASCII
symbol/punctuation.

> > 6. Hangul syllables (with conjoining characters, not
> > non-conjoining compatiblity characters) that represent
> > the same syllable must be mapped to the same representation.
> > Due to unfortunate historic reasons, this does no longer
> > happen automatically with NFKC (though for drafts for
> > NFKC it did).  Mappings should be added so that "syllabically"
> > equivalent Hangul conjoning characters are mapped to a common
> > representation.  Hangul compatibility letters should be
> > prohibited though.  Correctly mapping those is more complicated
> > than can be expressed in the (current form of) (host)nameprep
> > mappings. Hangul compatibility letters should instead be
> > prohibited. (Mapping table for Jamos, and prohibition table for
> > Hangul compatibility characters, are available upon request.)
> > Future keyboard, e.g., input may generate only single letter
> > Jamos, rather than any "cluster letter" Jamos or precomposed
> > Hangul syllable characters.  At the very least, hostnameprep
> > should not prevent such a development.
> 
> This sounds like you are proposing to fix/change a decision 
> made by the Unicode consortium. Why is that in scope?

Even though e.g. [gg] and [g][g] (there are a few hundred other examples)
are not canonically or compatibility equivalent, they still represent
the same sequence of Hangul letters, and thus "mean" the same.
Even though not all systems display the decompostions correctly yet,
there is no reason to believe that that they will not be supported
by most rendering engines for Hangul.  Unfortunately, the normal
forms cannot be changed at this time, even though that would have
been better.  There have been many misconceptions around about,
e.g., where Hangul syllable break are, which may have lead
to the current situation.  Once those misconceptions are cleared
up, you will likely see more comments to the fact that [gg] and [g][g]
(etc.!) are in fact equivalent from a Hangul perspective.

> > 9. User interfaces that encounter mixed script hostname *parts*
> > should be recommended to "flag" them (ballon warning, color
> > differentiate, make blinking, bounce automatic 
> registratations, ...).
> 
> By "*parts*" do you mean labels or something else?

stringprep: "DNS domain name parts";

idna: "A label is an individual part of a domain name";

nameprep: "This document describes how to prepare
	internationalized host name parts" (I think that's the
	wrong approach, e.g. it should apply to the entire name,
	but I'm just quoting for the term here.)

		Kind regards
		/kent k


>   Erik
>