[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: Legacy charset conversion in draft-ietf-idn-idna-08.txt



"Masahiro Sekiguchi" <seki@jp.fujitsu.com> writes:

>> The basic attack: Alice runs on host that uses Latin-1 for
>> input/output and enters www.µbank.com (where µ is 8859-1 0xB5).  The
>> domain is registered using U+00B5, but Alice's application transcode
>> the string using U+03BC.  Either Alice can't connect (if the other
>> domain doesn't exist) or she ends up talking to someone else (if the
>> other domain does exist).
>
> I agree the case you described is a problem.  However, I don't
> agree on the point you state is the cause, i.e., I don't think
> it is a transcoding problem.
>
> Please imagine that we are living in a ideal Unicode-only world.
>
> Assume the bank registers its domain name using Unicode U+00B5,
> intending "micro-bank."  Alice *may* type a key for U+00B5 is
> she is a computer engineer, but she may type U+03BC in if she is
> a Greek linguist, because her keyboard (or input mapper) will be
> optimized for Greek typing, or because her thinking is biased by
> her Greek familiarity (She probably read the name as "mu bank",
> being puzzled what it means.)
>
> Someone might say this is a Unicode problem.  Well, partly.  For
> this particular case, Unicode could have eliminated one of
> U+00B5 and U+03BC.  However, there are a lot of similar cases: l
> and 1, 0 and O, ´ and ', or ° and o, even in the 8859-1 range.
> We can't eliminate all of these similar lookings.
>
> Hence, I consider the basic problem is in our writing systems
> and I don't think it's feasible to fix them.

I agree it isn't feasible to fix them, so that problem cannot be
solved.  That problem should probably be mentioned in the security
considerations as well.  The user gets what she enters, and if she
enters something else than she expects to enter, there will be errors
that can have security implications.  The same problem exists today,
if you enter "mybank.com" instead of "mubank.com" no technical aid can
protect you from someone calling herself "mybank.com" setting up a
similar looking web site as "mubank.com", including server certs etc.

I think the transcoding issue is separate though.  The security
implications in the scenario above can be solved by having educated
users.  They must remember the exact spelling and exact characters
used to contact their bank (this isn't unreasonable).  However, when
the system uses transcoding to convert system characters into Unicode
characters, even a user entering the "correct" spelling cannot be
certain that she ends up at the right server because transcoding
algorithms are not specified by IDNA and is left to implementations.
The user can even look at the string she entered, and the string found
in a certificate and it is possible for them to match, octet-by-octet,
with her "correct" string, and still she is talking to the wrong
server because different mapping tables exists.

When transcoding algorithms are left unspecified, the only way for the
user to be able to verify the identity of the bank is to compare the
computed IDNA strings with what she wanted.  She enters the string
using system characters, it is converted into IDNA, the server is
contacted and a certificate is fetched.  Now, to be certain of the
identity, the application will likely compare the IDNA of the server
with the one in the certificate, but the user need to compare the
computed IDNA with something she knows, to be certain that the
application didn't use a transcoding algorithm different from what the
bank used, the CA used, and the user intended.

If transcoding algorithms was specified by IDNA, the second security
problem would be reduced into the first one (modulo any mistakes in
transcoding mapping tables -- once the tables are fixed, you can't
modify them unless you want to enable the attack again).  Instead, it
could be easier to just ignore the second security problem, assuming
that all implementations will transcode system characters into Unicode
characters in the same way.  Or that the problem is rare in practice
that it doesn't matter.  Or that the whole world will switch to
Unicode.  Or that I misunderstood everything and there isn't a problem
at all.  Either way, all I'm asking is that the problem and the
expected solution is discussed a bit further in the specification, so
that I can understand how to implement IDNA securely on my Latin-1
machine.

>> Suggested modified security consideration below.  It essentially says
>> that unless everyone switches to UTF-8, IDNA will enable new attacks
>> that has security implications.
>
> Mentioning the security implications is good.  Blaming it on
> transcoding is irrelevant.  Revilutionalize the world to use
> UTF-8 only doesn't completely eliminate the problem, IMHO.

As illustrated above, I believe there are two separate problems.  It
might be good to make both of them explicit.