[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: Fwd: Unicode letter ballot



Kenneth Whistler <kenw@sybase.com> wrote:

> And you can't escape the problem by just adding the 5 obsolete code
> points to the stringprep prohibited list,

True, but...

> because that, *too*, would have destabilized your specification: a
> string that was valid before you did that would be invalid after you
> did so.

Actually, adding compatibility characters to the Stringprep prohibited
list would have absolutely no effect, because Stringprep performs
normalization before prohibition.

Let's consider the possible scenarios:

1. The decomposition mappings are changed.

  1a. Stringprep/Nameprep track the update, breaking their promise of
      backward compatibility.

      If someone registers a name using a CNS 11643 string in
      combination with the old Nameprep, and later someone tries to look
      up the name using the very same CNS 11643 string in combination
      with the new Nameprep, it won't match (if it contains any of the
      five characters in question).  As more clients upgrade to the new
      Nameprep, the name will become less and less accessible.

      But the old-Nameprepped form of the name (the one that actually
      got stored in the database) will look visibly wrong, won't it?  If
      the registrant had been shown the Nameprepped form and asked for
      confirmation, the registration would probably have been aborted.
      So maybe this lookup failure would turn out to never happen in
      practice.

      On the other hand, if someone registers a name using a CNS 11643
      string in combination with the new Nameprep, and later someone
      tries to look up the name using the very same CNS 11643 string in
      combination with the old Nameprep, it won't match.  But this time
      the Nameprepped form shown to the registrant will look correct, so
      names causing this failure might be more likely to be registered
      than names causing the previous failure.  For this failure, as
      more clients get upgraded, the name will become more and more
      accessible.

  1b. Stringprep/Nameprep do not track Unicode updates; they remain
      frozen at a version containing the old mappings.

      If someone registers a name using a CNS 11643 string, and later
      someone tries to look up the name using the very same CNS 11643
      string, it will match.  But if a modern normalization operation
      gets inserted somewhere (for the heck of it, or because some
      other protocol that carries the domain name requires text to be
      normalized), it won't match.

      Names using the broken compatibility characters might not be
      registered in practice, because the Nameprepped form will look
      wrong.

      Since Nameprep never changes again, there is no transition as
      software gets upgraded.  Names involving the five characters in
      question remain just as broken and undesirable in the far future
      as in the near future.  It is likely that these five characters
      would never be used in domain names in practice, forever.

  1c. Stringprep/Nameprep track Unicode updates, but require the use of
      NormalizationCorrections.txt to undo any changes to decompositions
      since Unicode 3.2.

      This is exactly like case 1b except that IDNA can take advantage of
      other updates to Unicode, like new characters.

      As software gets upgraded, the mappings of the five characters
      in question remain the same (broken), so it is likely that these
      five characters would never be used in domain names in practice,
      forever, same as case 1b.

2. The decomposition mappings are not changed; the characters are
   deprecated, and new characters added with the correct decompositions.

  2a. Stringprep/Nameprep eventually allow the use of a version of
      Unicode that that includes the added characters.

      If someone registers a name using a CNS 11643 string in
      combination with the old CNS-to-Unicode tables, and later someone
      tries to look up the name using the very same CNS 11643 string in
      combination with the new CNS-to-Unicode tables, it won't match.
      As more clients upgrade to the new CNS-to-Unicode tables, the name
      will become less and less accessible.

      If the registrant is shown the Nameprepped form of the name, it
      will look visibly wrong, so the name will probably not end up
      getting registered, so this lookup failure might never happen in
      practice.

      If someone registers a name using a CNS 11643 string in
      combination with the new CNS-to-Unicode tables, and later someone
      tries to look up the name using the very same CNS 11643 string in
      combination with the old CNS-to-Unicode tables, it won't match,
      but this time the Nameprepped form shown to the registrant will
      look correct, so names causing this failure might be more likely
      to be registered than names causing the previous failure.  For
      this failure, as more clients get upgraded, the name will become
      more and more accessible.

  2b. Stringprep/Nameprep freeze on a version of Unicode that does not
      include the added characters, never to be updated again.

      If someone registers a name using a CNS 11643 string in
      combination with the old CNS-to-Unicode tables, and later someone
      tries to look up the name using the very same CNS 11643 string in
      combination with the new CNS-to-Unicode tables, it won't match.
      As more clients upgrade to the new CNS-to-Unicode tables, the name
      will become less and less accessible.

      If the registrant is shown the Nameprepped form of the name, it
      will look visibly wrong, so the name will probably not end up
      getting registered, so this lookup failure might never happen in
      practice.

      If someone tries to register a name using a CNS 11643 string in
      combination with the new CNS-to-Unicode tables, it will fail,
      because the new CNS-to-Unicode tables will use the new characters,
      which are unassigned in the version of Unicode used by Nameprep,
      and therefore disallowed in stored strings.

      It is likely that these five characters would never be used in
      domain names in practice, forever.

It's interesting that cases 1a and 2a are basically identical, and cases
1b and 2b are extremely similar, although they correspond to opposite
outcomes of the vote.

The biggest difference I can see between case 1 and case 2 is that if
the deprecate/add approach is taken (case 2), IDNA's only choices are
to track Unicode updates (2a) or not track them (2b), all or nothing.
But if the fix-decompositions approach is taken (case 1), IDNA has a
third option (1c) of tracking all Unicode updates except changes to
decompositions, via the NormalizationCorrections.txt file.

I find this result to be counter-intuitive.  My initial gut feeling was
that the deprecate/add approach was more conservative, and safer.  But
apparently not.

AMC