[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Tilde



tedd <tedd@sperling.com> wrote:

> in IDNA the Tilde (code point 007E) is prohibited, but the Tilde
> Operator (code point 223C) is not.

IDNA inherits the prohibition of U+007E from RFC-1123 (STD-3), which by
reference to RFC-952 defined host names as ASCII strings containing only
A-Z, a-z, 0-9, hyphen-minus, and dot.  Therefore some ASCII characters
were explicitly allowed, all other ASCII characters were explicitly
forbidden, and non-ASCII characters were not even in the realm of
possibility.

In order to extend the notion of host name to non-ASCII strings, we
needed to keep the existing prohibitions on ASCII characters in host
names (otherwise it wouldn't be a proper extension), but the rules
for non-ASCII characters were up to the working group to define.  The
consensus was to allow all non-ASCII Unicode graphic characters (perhaps
because the group could never have reached agreement on any particular
non-empty set of prohibited graphic characters).

> Considering that keyboard space is at a premium, why isn't code point
> 007E mapped to 223C in PUNYCODE?

Punycode accepts and supports all Unicode characters, including
non-graphic characters and all ASCII characters, including U+007E.  It
does no mapping.  All mapping and prohibition are done at higher layers.

I supposed you could instead ask why tilde isn't mapped to tilde
operator in Nameprep.  The mapping step in Nameprep was designed to
avoid alternate representations of the same characters, and to erase
case distinctions, not to save typing.  Tilde and tilde operator are
entirely distinct characters according to the Unicode spec (and if we
had decided not to accept the Unicode spec at face value, we'd still be
arguing about what maps to what).  If tilde operator is too difficult to
type, then don't register domain names containing it.

We made one concession for ease of typing, for dot, only because all
domain names (except TLDs) are *required* to contain dots, and dots can
be cumbersome to type for the huge number of CJK users.  The mapping
from ideographic full stop to dot is not done in Nameprep, which sees
only individual labels, not the separators between them, but at a
higher layer that divides the domain name into labels, converts them
independently, and glues them back together.

AMC