[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Problems in normalisation and matching



>First, the Nameprep/Stringprep is designed to handle domain names on a _per
>label_ basis. Before some IDNs going thru Nameprep, it is already broken up
>into its individual labels so Nameprep arent the place to fixed.
>
>The place where IDNs get broken down into label is in IDNA. What IDNA now
>specify is that to break down IDNs into their label, you look for this set
>of separators (U+002E, U+3002, U+FF0E, U+FF61). (See IDNA Requirement 1)
>

IDNA does define how to handle labels, not complete domain names.

A domain name can be used in many places and are included in many
protocols. For example it is used in HTTP, SMTP and HTML.

Today there exist many restrictions on "hostnames" due to simplify
handling of domain names by software and make them easier to
identify by users. To make this possible the restrictions
do not allow normal separator characters in a name and have labels
separated by ONE separator character.
Now when we expand the allowed characters in a domain name, then
allowed characters and syntax should follow the same rules:
- The labels of a domain name is separated by "full stop" U+002E
  and are written from left to right with least significant label
  first.
  Other characters or display form may be used in user interfaces
  but have to be converted into standard form in protocols.
- Separator characters like SPACE (U+0020) may not be used.
  (this results in that a domain name cannot use NFKC as it
   decomposes non-spacing accents into a space character followed
   by a combining accent making simple parsing difficult).

It is important to separate the standard way a domain name is written
in protocols from the free form way it can be written when
interacting with users. In protocols we want a form that can easily
be parsed and identified.

   Dan