[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: Legacy charset conversion in draft-ietf-idn-idna-08.txt



Roozbeh Pournader <roozbeh@sharif.edu> writes:

>> The basic attack: Alice runs on host that uses Latin-1 for
>> input/output and enters www.µbank.com (where µ is 8859-1 0xB5).  The
>> domain is registered using U+00B5, but Alice's application transcode
>> the string using U+03BC.  Either Alice can't connect (if the other
>> domain doesn't exist) or she ends up talking to someone else (if the
>> other domain does exist).
>
> I'm sorry, but your example doesn't work. In nameprep, when doing Unicode 
> Normalization, U+00B5 is mapped to U+03BC. So these will be the same 
> domain name, and have the same ACE label.

You are right.  What about other examples?

ISO-8859-1 0xB5:   U+00B5 / U+03BC: Mapped to U+03BC as you indicate
ISO-8859-1 0xC5:   U+00C5 / U+212B: Mapped to U+00C5
CP437 0xE1:        U+03B2 / U+00DF: ?
CP437 0xEA:        U+03A9 / U+2126: Mapped to U+03A9
CP437 0xEE:        U+03B5 / U+2208: ?
JIS-X-0208 0x2140: U+005C / U+FF3C: ?

"?" means I could not find any KC normalization in the Unicode tables
at http://www.unicode.org/charts/normalization/, I'm not sure how to
interprete this.  Possibly it means they are not normalized, in which
case there is a problem?

I agree with Mark Davis that it would be interesting to find out which
and how many characters in commonly used legacy charsets that may
cause these problems.

Also note that, if these tables are ever changed in the future, this
could also be exploited.  Application A uses mapping table version Y
and application B uses mapping table version Y+1 which transcode
and/or normalizes characters differently.  In this case someone could
register either old or new domain and fool either new or old
applications.