[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: IDNA: is the specification proper, adequate, andcomplete? (was: Re: I-D ACTION:draft-ietf-idn-idna-08.txt)



At 1:16 PM +0200 6/17/02, Simon Josefsson wrote:
vint cerf <vinton.g.cerf@wcom.com> writes:

 It seems to me that we err if we mix "finding" identifiers
 (with search engines, elaborate directories that offer multiple
 choices of IDNs based on imprecise search criteria) with
 resolving unambiguous identifiers into their respective IP addresses
 (speaking roughly since DNS also offers indirect resolutions such as
 MX, CNAME and so on).

 I think we do ourselves a disservice if we try to make DNS resolve
 ambiguous references - it is not designed for such applications;
 search engines and directory structures are more oriented towards
 that aspect of finding things "by name" on the Internet.
This seem to argue against the current design of IDNA.

IDNA resolves some ambiguities in identifiers by Unicode
normalization, and introduces further ambiguities by not handling
legacy charset transcoding issues at all.
Simon, both of those statements are wrong, and Vint is right. Unicode normalization doesn't fix ambiguous references, it canonicalizes references: there is a huge difference between those two. "Letter A followed by combining umlaut" is not ambiguous: it means that the display should show an a with an umlaut over it. There are charset transcoders today that transcode differently from each other. That's not an ambiguity, that's a mistake. No one can create protocols that fix every previous mistake.


Now, one can argue that Unicode normalization is only used because
Unicode happens to have different ways of representing the same, or
non-visual, characters, but nevertheless this adds an ambiguity
resolving mechanism to software.  One that will have to be modified
over time, as well, since consensus on how to resolve ambiguities will
change over time.  I have trouble visualizing how this can be
implemented and work well for 2, 5, 10 years and more, when Unicode
and other charsets are moving targets.
So your solution is that nothing can ever be internationalized?

--Paul Hoffman, Director
--Internet Mail Consortium