[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: Legacy charset conversion in draft-ietf-idn-idna-08.txt



Roozbeh Pournader <roozbeh@sharif.edu> writes:

>> CP437 0xE1:        U+03B2 / U+00DF: ?
>> CP437 0xEE:        U+03B5 / U+2208: ?
>
> As far as I know, all existing CP437 tables map those to SMALL SHARP S
> (U+00DF) and SMALL EPSILON (U+03B5) and not SMALL BETA or ELEMENT OF.
> I just checked everywhere I could, this is the list:
>
> 	http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT
> 	http://microsoft.com/globaldev/reference/oem/437.htm
> 	http://www.kostis.net/charsets/cp437.htm
>
> Can you point me to anywhere disagreeing?

No.  I was thinking of mapping from Unicode into legacy charsets here,
which probably wasn't clear (I probably hadn't realized I made that
assumption when I wrote that message).

To understand why that is ever relevant, I think it is useful to
summarize the security related problems with internationalized text
identifiers in applications.  We have only talked about the first two
yet.

1) Internationalized text strings extend the set of possible
   characters to use in identifiers.  Traditionally a-z0-9- is such a
   small set that users can separate two elements of the set easily
   ("oh! mybank.com is not the same string as mubank.com, I must be
   under attack!").  With Unicode, users cannot easily do this
   anymore.  I think the point of CK normalization is to mitigate this
   problem, but the basic problem will still be there.  Users must
   understand that different but similar looking characters are
   different.  This problem cannot be solved entirely, and people will
   exploit this fact.

2) On systems with non-ASCII non-Unicode charsets, applications need
   to transcode strings entered using the system charset into Unicode
   before applying IDN stuff.  If different applications use different
   mapping tables, or if those mapping tables change, in a way that CK
   normalization does not cancel out, there will be additional
   attacks.

3) On systems with non-ASCII non-Unicode charsets in security
   applications, the system will have to convert Unicode in IDNA into
   system charset before sending it to those security applications.
   Here is the problem I was referring to above.  Consider a user
   entering a string in the browser containing CP437 0xE1, it is
   IDNAlized within the resolver and the user connects to the server,
   receives (for security purposes, in a TLS stream for instance),
   e.g., a certificate containing an IDNA.  If IDNA is not supposed to
   force the entire system to switch to Unicode, the application will
   have to convert the IDNA string into the system charset to display
   it to the user.  So it converts U+03B2 into CP437 0xE1 and displays
   it, the user compares the strings (possibly even by studying the
   byte sequence to be sure, if a system charset with similar looking
   symbols are used) and can verify it.  However, it is not
   unreasonable for the application to convert U+00DF into CP437 0xE1
   as well.  So the attack will only have to register the domain using
   U+00DF instead of U+03B2 in order to mount an attack.

Ok, I admit that the case in 3) can be solved in two ways.  Either the
user is shown the IDNA strings and is allowed to compare them.  Or the
system is upgraded to use Unicode in all involved applications,
including the display engine.  My point is that this attack isn't
discussed in the security considerations.  Also, the two solutions
aren't very good, IMHO, since the first one will generate a bad user
experience and the second will take years to implement.  And bad
solutions are only implemented badly or not at all, so this will most
likely generate security incidents.  Which for prudence should at
least be mentioned in security considerations.