[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: IDNA: is the specification proper, adequate, and complete? (was: Re: I-D ACTION:draft-ietf-idn-idna-08.txt)



At 6:32 PM +0200 6/17/02, Simon Josefsson wrote:
If I see the (swedish) word "å" displayed on my screen, cut'n'paste it
into a browser, an IDNA resolver will normalize this into U+00C5
before a server is queried for that string, regardless of whether the
original string was U+00C5 or U+212B.  Isn't this "resolving
ambiguity"?
No, it is canonicalizing. In the bits on the wire, there is nothing ambiguous about the combined version or the uncombined version: they are very clearly different sets of characters, and the representation of those characters in every encoding of Unicode is also non-ambiguous.


  Isn't there an ambiguity between U+00C5 and U+212B?
Only visually, not in the protocol.

  If
there isn't an ambiguity between U+00C5 and U+212B, why does IDNA
treat them the same?
It doesn't. It canonicalizes one into the other. That is far from "treating them the same", yes?


  Perhaps I fail to communicate, not being a
native speaker perhaps I'm interpreting the word "ambiguous"
incorrectly, although my dictionary doesn't seem to help me find any
alternative interpretation.
My dictionary has these definitions:
- having two or more possible meanings
- doubtful, uncertain
U+00C5 and U+212B do not have the same meaning, and there is nothing doubtful or uncertain about either of them.


 > There are charset transcoders today that transcode differently from
 each other. That's not an ambiguity, that's a mistake. No one can
 create protocols that fix every previous mistake.
You can fix the one mistake.
Which one mistake is that? There are probably dozens of transcoders with errors, and worse yet, there are probably dozens of transcoder implemntors that, in the face of some IETF or Unicode standard that tells them how to transcode, would say "screw you, you don't understand our language" (and they would possibly be correct).


 > So your solution is that nothing can ever be internationalized?

That's not a solution, and that's not what I'm proposing, I don't
understand how that could ever be read into what I wrote,
Because you said "I have trouble visualizing how this can be implemented and work well for 2, 5, 10 years and more, when Unicode and other charsets are moving targets." I agree with you that Unicode and other charsets are moving targets.


 but I'll try
to be specific on how to solve the problems with IDNA right now,
giving internationalized domain names that would be secure and could
be implemented and continue to work years ahead:

First, specify clearly that application MUST NOT use any other
normalization table than the one defined in the IDNA spec suite
(following Unicode 3.1 currently, being updated to Unicode 3.2 if I
understand things correctly) and that in particular normalization
tables supplied by operating systems should never be used unless the
application author can assert that they will never change throughout
the lifetime of the application (which probably only will be true if
the application author is the operating system author).
We already say the first part (you must use the Unicode 3.1 -- soon to be Unicode 3.2 -- table). We don't say the second part because it flows from the first part.


Secondly, define how to transcode legacy charsets into Unicode, and
specify that only this transcoding table is to be used.  Transcoding
mapping tables can be defined in RFCs, much like MIME CTE's or
similar.  The initial IDNA spec suite could define transcoding tables
for commonly used charsets; ISO-8859-X, ISO-2022-X, KOI8-X,
KS-C-5601-X etc.
Yes, we could do that, but the IETF lacks both the linguistic and political expertise to do it. The fact that even the experts such as ISO and the Unicode Consortium have not chosen to do this should be a very broad hint to you about why the IETF shouldn't. But if you really think this is needed (I still don't), you absolutely should ask the appropriate bodies (Unicode or ISO) to do it. If they do it, I'd bet that the IETF would strongly consider pointing to those standards.


The main argument against these proposals are that they require lots
of work to implement, but if the alternative is poor security, I'd
rather have people do lots of work.  A lesser argument against it is
that they don't adopt new updates to Unicode, but that is by design.
We disagree about what the main argument is. Creating transcoding tables is easy; in fact, it has already been done. See <http://www.unicode.org/Public/MAPPINGS/> for some non-official mappings.

My main argument against the IETF doing this is that being sure the tables are "right" is nearly impossible because it involves getting consensus among the users of the scripts and the experts.

--Paul Hoffman, Director
--Internet Mail Consortium