[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Problems in normalisation and matching



At 08:56 AM 7/1/2002 +0800, James Seng wrote:
I remember the "dot issues" was extensively discussed by the Nameprep Design
Team. It is decided that dots (other than U+002E) should be included because
there are IMEs which generate these dots in place of the normal dots (it
become a hassy to switch in and out of IME just for the dot).
This is a confuses user interface issues with protocol issues. The IETF tries to stay away from user interface standardization, even though domain names do have a human representation.

User interfaces must adapt to a wide range of usability issues. Protocols are not supposed to suffer that burden.

It is the job of the user interface to map whatever typing codes it chooses to, into the constrained protocol codes. The theory behind typical Internet protocols -- and most other modern protocol standards -- is that the world chooses ONE way to do a thing and everyone with other ways maps to that one way.

The concern for cut-and-paste is obviously valid, but it is not the job of the IETF protocol standards to operate well within a user cut-and-paste environment.



 Now, some may
say IME is out of scope but on the other hand, we really dont need to rehash
a topic which have been concluded. Lets move forward.
Introducing user interface issues into a protocol design is a good way to impair interoperability, because it adds variability. That makes the protocol not work.

Moving forward is a good idea. Except when it is moving backward.



The place where IDNs get broken down into label is in IDNA.
James. Forgive me, but I do not understand this statement. IDNs are ALREADY series of separate labels. IDNA does not "separate" a domain name into labels.

Note that IDN maintains the same kind of dot separater as the "unaware" legacy domain name world, even if it uses multiple choices for the dot character.

All IDNA does is to ENCODE those separate labels into a kind of UTF-7 (that is, ACE).



Comparison is also done on a per label basis. A IDN is considered equivalent
if and only if all their individual labels are equivalent. The separators
during comparison is also irrelevant. (See IDNA Requirement 4)
One bit of confusion that I did not pursue with my suggested revisions is the idea that an IDN can only be compared it its IDNA form.

In terms of formal specification, this cannot be correct. If there is another encoding for IDNs, then the IDNA specification is essentially saying that such encodings must be mapped to IDNA, first, in order to do comparisons.

If that is what the working group really does mean, it should be stated as part of an IDN specifications, separate from the IDNA specification, because it is another formal change to the DNS.

d/

----------
Dave Crocker <mailto:dave@tribalwise.com>
TribalWise, Inc. <http://www.tribalwise.com>
tel +1.408.246.8253; fax +1.408.850.1850