[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Problems in normalisation and matching



Dan Oscarsson <Dan.Oscarsson@trab.se> wrote:

> Now when we expand the allowed characters in a domain name, then
> allowed characters and syntax should follow the same rules:
> - The labels of a domain name is separated by "full stop" U+002E
>   and are written from left to right with least significant label
>   first.
>   Other characters or display form may be used in user interfaces
>   but have to be converted into standard form in protocols.

IDNA says virtually nothing about how IDNs are to be represented in
new protocols.  New protocols can use the ASCII representation, or an
unconstrained Unicode representation (like UTF-8), or can define their
own more restricted representation (for example, nameprepped UTF-8 using
only U+002E as separator).

Although IDNA says that the other dot characters must be recognized as
dots, it does not say that they must be allowed in new protocols.  If a
new protocol forbids the other dot characters, recognizing them as dots
will be a no-op.

IDNA says that old protocols must use the ASCII representation, using
only U+002E as dots.

There has never been a consensus on a particular non-ASCII
representation for use in new protocols, and we don't need one in order
to start deploying IDNA.  That's why IDNA is silent on that issue.

AMC