[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Comparisons of the proposals



This WG is supposed to produce a document comparing that proposed IDN
solutions. My earlier suggestion of how this might happen (have each
proposal say how it matches or doesn't match each requirement in the
requirements document) got no interest.

The WG might feel better about a single general document that compares
the proposals. To that end, I have listed what I see as the relevant
factors for comparison. Just to be clear, I think it would a Very Bad
Thing to have one of the authors of one of the proposals, or anyone
else who is very clearly aligned with a particular solution, be the
editor of the comparison document. Having said that, I'd be happy if
whoever volunteers to be the editor used any of all of what I say here.

This list is peppered with many notes that reflect changes that might
happen to the drafts. Dan Oscarsson and I have been conversing about
our respective drafts, and this trading of ideas has borne a great deal
of fruit.

Key

utf-5:  draft-jseng-utf5-01.txt
cidnuc: draft-hoffman-idn-cidnuc-03.txt
8&down: draft-oscarsson-i18ndns-00.txt


Protocol overview

utf-5:  On-the-wire protocol is compatible with today's host names.
         No further detail on use in DNS, but will probably be the
         same as cidnuc.
cidnuc: On-the-wire protocol is compatible with today's host names.
         Specifies that displayed names should be converted to
         internationalized characters.
8&down: On-the-wire protocol is UTF-8. DNS requests use a new
         bit in the DNS request packet. Protocols that cannot use
         UTF-8 directly downcode to an equivalent name that is
         compatible with today's host names.


Tagging compatible name parts

utf-5:  None specified.
cidnuc: Three-character prefix that is currently unused by names in
         .com, .org, and .net. List discussion has proposed four-character
         prefixes, and using suffixes instead of prefixes.
8&down: Trailing hyphen. This makes the host name part illegal and thus
         detectable.


Number of internationalized characters per name part (see
<ftp://ftp.unicode.org/Public/UNIDATA/Blocks.txt> for script ranges)

utf-5:  31 for Basic Latin and Latin-1 Supplement; 21 for Latin
         Extended-A through Tibetan; 15 for Myanmar through end of BMP;
         12 for non-BMP.
cidnuc: 36 for all scripts other than Han, Yi, Hangul syllables, and
         non-BMP; 18 for those scripts and for names that mix scripts.
8&down: UTF-8 allows 63 for Basic Latin; 31 for Latin Extended-A
         through Tibetan; 21 for Myanmar through end of BMP; 15 for non-
         BMP. However, because names may also have to exist in their
         downcased forms, these numbers change to 63 for Basic Latin
         (other than hyphen); 12 for Latin Extended-A through Tibetan; 8
         for Myanmar through end of BMP; 6 for non-BMP.


Notes on numbers of characters

utf-5:  Does not specify how to unambiguously tag names as being
         converted. When it does, the numbers given here will drop by
         one or two characters.
cidnuc: The tagging method could change to either four or five
         characters without changing the number of characters encoded
         (36 and 18). If the tagging method changes to that of 8&down
         (trailing hyphen), that would increase the number of characters
         by 2 for the first case and 1 for the second. There is also a
         proposal to allow better compression for names that mix scripts
         other than Basic Latin and Latin Extended-A with these
         characters.
8&down: The downcasing algorithm is likely to change to allow many more
         characters.


Changes to DNS protocol and resolvers

utf-5:  None.
cidnuc: None.
8&down: Use of new IN bit in requests. 


Changes to other Internet protocols

utf-5:  None specified, but will be the same as cidnuc.
cidnuc: All protocols that take in or display host names should convert
         from compatible format to internationalized characters.
8&down: None for protocols that can natively handle UTF-8. Others
         must use downcased names internally.


Effects of leakage

utf-5:  Applications may expose opaque names instead of
         internationalized names.
cidnuc: Applications may expose opaque names instead of
         internationalized names.
8&down: UTF-8 characters may appear in protocols that cannot handle
         them. However, this is fairly unlikely because these
         applications will probably not put internationalized characters
         on the wire in their own protocols.

--Paul Hoffman, Director
--Internet Mail Consortium