[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Comments on IDNA/stringprep/nameprep



I wrote:

> IDNA says nothing about the separators between labels except that they
> are "usually" dots.  IDNA says nothing about how to split domain names
> into labels, or join labels into domain names.  It neither requires
> nor prohibits the acceptance of fullwidth full stop as a domain label
> separator.

Kent Karlsson <kentk@md.chalmers.se> replied:

> Hmmm, this kind of contradicts your statements in your other e-mail.

I don't see how.  Maybe you're thinking of comments made by someone
else.

> I think something should be said about this, namely that the entier
> IDN is processed the same way before any kind of part splitting.  When
> everything else is so detailed about mappings etc., this should be the
> same.  Otherwise how does anyone know if FULLWIDTH STOP is allowed (as
> a part separator) in an IDN or not?

That is a sensible concern.  The difficulty is that there is no existing
standard for this, even for LDH host names, so we have no starting
point to internationalize.  DNS queries and responses don't use dots at
all, they use length bytes.  DNS master files use ASCII dots as label
separators, and require a trailing dot.  URIs also use dots as label
separators, but the trailing dot is optional.  In email message headers
the trailing dot is forbidden, and the other dots can be surrounded by
optional whitespace and parenthesized comments (no kidding!).

Thus, domain names are defined as a sequence of labels, but the
representation of that sequence is not standardized, so it would be
difficult for IDNA to say precisely how to internationalize this
non-standard.  Furthermore, given that applications often pick domain
names apart and put them together, it is safest if IDNA operates on
labels independently.

However, I suppose we might want to consider having the IDNA spec to say
something vague like:

    Applications that use U+002E (FULL STOP) to separate domain labels
    are encouraged to accept all the full-stop characters as label
    separators:

    U+002E FULL STOP
    U+0589 ARMENIAN FULL STOP
    U+06D4 ARABIC FULL STOP
    U+0701 SYRIAC SUPRALINEAR FULL STOP
    U+0702 SYRIAC SUBLINEAR FULL STOP
    U+1362 ETHIOPIC FULL STOP
    U+166E CANADIAN SYLLABICS FULL STOP
    U+1803 MONGOLIAN FULL STOP
    U+1809 MONGOLIAN MANCHU FULL STOP
    U+3002 IDEOGRAPHIC FULL STOP
    U+FE52 SMALL FULL STOP
    U+FF0E FULLWIDTH FULL STOP
    U+FF61 HALFWIDTH IDEOGRAPHIC FULL STOP

This calls into question the current nameprep prohibition of U+3002.
Why pick on just that one?  If we make a special case for U+3002, why
not do the same for all the other full-stop characters?  On the other
hand, nameprep does *not* prohibit U+002E (and it shouldn't--notice that
STD-13 includes an explicit demonstration of how to put ASCII dots in
domain labels).  U+002E is prohibited for host names in ToASCII, but
allowed for non-host domain names.  Maybe the same should be true of the
other full-stop characters.  Or maybe they should simply be allowed,
and registrants should just be advised that domain labels containing
full-stop characters will be difficult or impossible to use in many
situations.

AMC