[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: Document Status?



Dave Crocker <dhc@dcrocker.net> writes:

> Mostly the problem with this thread is that people are trying to
> debate concepts and principles, but they are failing to provide any
> detailed scenarios that cause problems.

The failure scenarios have been named, and as a result I think some
are now even discussed in the specifications.  From the top of my
tired head (I'm sure you'll correct my errors):

+ The entire world doesn't use Unicode, which is where IDNA starts.
  There are examples of characters in european charsets that may fail
  to translate into Unicode properly (e.g., greek beta and german ss
  in CP437).  I suspect this might be more common in non-western
  charsets.  If someone has looked into this area closer, I'd
  appreciate a pointer.  The IDN specifications surely doesn't deal
  with it.  Detailed scenario that fails: www.ßeta.com browsed from
  CP437 platform.

+ The choice of Unicode normalization KC has been questioned.  Again
  since I'm familiar with european charsets, I have the simple example
  of normalization of ß into ss.  There are supposedly distinct words
  where this normalization process removes the possibility of
  distinguishing between the words.  Non-western charsets probably has
  more cases like this. Detailed scenario that fails: www.masse.de
  (translation: mass, majority) and www.maße.de (translation: metrics,
  gauges) are indestinguishable.

+ Any modifications to the Unicode code charts or normalizations
  tables destroy stability of IDN.  This is handled by locking IDN to
  Unicode 3.2 (I believe).  I haven't a specific scenario which fails,
  but this will add considerably code complexity in software since
  they need to implement Unicode normalization, Unicode tables and
  possibly also Unicode bidi algorithms internally instead of relaying
  on a unicode implementation in the operating system.  It may seem
  like this problem can't be solved in a better way, but I have a
  feeling a better design could fix this.

+ Unicode normalization and bidi rules interact problematically.
  Consider the string U+05D0 U+0966, it is a forbidden bidi sequence.
  U+2135 U+0966 is not a forbidden sequence.  Yet, U+2135 is
  normalized into U+05D0.  Detailed scenario that fails: STRINGPREP on
  string www.U+2135U+05D0.com work from machines that doesn't perform
  Unicode normalization (which IS optional).  Yes, OK, this is not a
  IDN problem but a STRINGPREP problem, but STRINGPREP seem to come
  out of this group so it is related.

These are things I've discovered by participating here for a month or
two and I don't pretend to understand these issues.  If the
specification doesn't solve, or at least discusses, the issues above,
how will they be implemented correctly?

PS.
  My point of view of IDN is not to enable fancy glyphs with it; it
  is to integrate IDN securely in protocols like TLS, Kerberos,
  OpenPGP and S/MIME which uses domain names for security critical
  things.  What may be sufficient for the web browsing herd may not be
  adequate for the security conscious club.  This focus seems to have
  been neglected.