[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] Comparisons of the proposals
- To: idn@ops.ietf.org
- Subject: [idn] Comparisons of the proposals
- From: Paul Hoffman / IMC <phoffman@imc.org>
- Date: Sat, 18 Mar 2000 15:23:13 -0800
- Delivery-date: Sat, 18 Mar 2000 15:25:08 -0800
- Envelope-to: idn-data@psg.com
This WG is supposed to produce a document comparing that proposed IDN
solutions. My earlier suggestion of how this might happen (have each
proposal say how it matches or doesn't match each requirement in the
requirements document) got no interest.
The WG might feel better about a single general document that compares
the proposals. To that end, I have listed what I see as the relevant
factors for comparison. Just to be clear, I think it would a Very Bad
Thing to have one of the authors of one of the proposals, or anyone
else who is very clearly aligned with a particular solution, be the
editor of the comparison document. Having said that, I'd be happy if
whoever volunteers to be the editor used any of all of what I say here.
This list is peppered with many notes that reflect changes that might
happen to the drafts. Dan Oscarsson and I have been conversing about
our respective drafts, and this trading of ideas has borne a great deal
of fruit.
Key
utf-5: draft-jseng-utf5-01.txt
cidnuc: draft-hoffman-idn-cidnuc-03.txt
8&down: draft-oscarsson-i18ndns-00.txt
Protocol overview
utf-5: On-the-wire protocol is compatible with today's host names.
No further detail on use in DNS, but will probably be the
same as cidnuc.
cidnuc: On-the-wire protocol is compatible with today's host names.
Specifies that displayed names should be converted to
internationalized characters.
8&down: On-the-wire protocol is UTF-8. DNS requests use a new
bit in the DNS request packet. Protocols that cannot use
UTF-8 directly downcode to an equivalent name that is
compatible with today's host names.
Tagging compatible name parts
utf-5: None specified.
cidnuc: Three-character prefix that is currently unused by names in
.com, .org, and .net. List discussion has proposed four-character
prefixes, and using suffixes instead of prefixes.
8&down: Trailing hyphen. This makes the host name part illegal and thus
detectable.
Number of internationalized characters per name part (see
<ftp://ftp.unicode.org/Public/UNIDATA/Blocks.txt> for script ranges)
utf-5: 31 for Basic Latin and Latin-1 Supplement; 21 for Latin
Extended-A through Tibetan; 15 for Myanmar through end of BMP;
12 for non-BMP.
cidnuc: 36 for all scripts other than Han, Yi, Hangul syllables, and
non-BMP; 18 for those scripts and for names that mix scripts.
8&down: UTF-8 allows 63 for Basic Latin; 31 for Latin Extended-A
through Tibetan; 21 for Myanmar through end of BMP; 15 for non-
BMP. However, because names may also have to exist in their
downcased forms, these numbers change to 63 for Basic Latin
(other than hyphen); 12 for Latin Extended-A through Tibetan; 8
for Myanmar through end of BMP; 6 for non-BMP.
Notes on numbers of characters
utf-5: Does not specify how to unambiguously tag names as being
converted. When it does, the numbers given here will drop by
one or two characters.
cidnuc: The tagging method could change to either four or five
characters without changing the number of characters encoded
(36 and 18). If the tagging method changes to that of 8&down
(trailing hyphen), that would increase the number of characters
by 2 for the first case and 1 for the second. There is also a
proposal to allow better compression for names that mix scripts
other than Basic Latin and Latin Extended-A with these
characters.
8&down: The downcasing algorithm is likely to change to allow many more
characters.
Changes to DNS protocol and resolvers
utf-5: None.
cidnuc: None.
8&down: Use of new IN bit in requests.
Changes to other Internet protocols
utf-5: None specified, but will be the same as cidnuc.
cidnuc: All protocols that take in or display host names should convert
from compatible format to internationalized characters.
8&down: None for protocols that can natively handle UTF-8. Others
must use downcased names internally.
Effects of leakage
utf-5: Applications may expose opaque names instead of
internationalized names.
cidnuc: Applications may expose opaque names instead of
internationalized names.
8&down: UTF-8 characters may appear in protocols that cannot handle
them. However, this is fairly unlikely because these
applications will probably not put internationalized characters
on the wire in their own protocols.
--Paul Hoffman, Director
--Internet Mail Consortium