[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: IDNA: is the specification proper, adequate, and complete?



Simon Josefsson <simon plus idn at josefsson dot org> wrote:

> Resolving ambiguity in this way can introduce ambiguity.  Consider a
> user intentionally entering U+212B because it has a different meaning
> than U+00C5 attached to it in Unicode, IDNA resolves this into one
> code point.  Unless the user knows the Unicode 3.2 decomposition
> table, it is uncertain to her whether those two code points are
> treated differently or not.

I don't see where the problem lies.  Yes, U+00C5 and U+212B do look
identical; the existence of pairs like this is the main reason why
canonical equivalence tables exist.  If you *could* have two separate
domain names whose only difference was that one contained U+00C5 Å and
the other contained U+212B Å, now *that* could cause spoofing problems.

> If you look at the character "å" it can have at least two (Unicode)
> meanings, either U+00C5 or U+212B.  This ambiguity is resolved in IDNA
> by normalization.  To the user, whether "å" denotes U+00C5 or U+212B
> is "ambiguous".

There aren't two meanings.  Unicode considers them to be "canonically
equivalent."  They both exist in Unicode because somewhere, once upon a
time, there was a legacy character set that included both, and Unicode
wanted to provide round-trip mappings for both.

That should be a clue that these "problems," if they are that, didn't
originate with Unicode.

> The mistake is that the transcoding tables are not specified
> somewhere, for IDNA purposes.

IDNA deals with Unicode.  Transcoding between Unicode and other
encodings is the responsibility of client-side software that does not
natively deal with Unicode.

> Without solving transcoding issues, and if banks have the nerves of
> using non-ASCII IDN's for critical purposes, people will exploit these
> characteristics of IDNA.  With the current proposal, I think banks
> will not chose to use IDN's because there are security concerns, which
> I consider a partial failure of IDN.

I have to agree with Paul.  Neither the Unicode Technical Committee, nor
ISO/IEC JTC1/SC2/WG2, nor anyone else can force vendors to adopt uniform
mapping tables, or to mean the same thing when they say (for example)
"Shift-JIS."  Applications will continue to use transcoding routines
supplied by the operating system, even if Unicode or IDNA were to
publish "definitive" mapping tables.

Transcoding between 80,000 legacy encodings is what we had to do before
Unicode came along.

-Doug Ewell
 Fullerton, California