[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] quick & dirty (but not too dirty) homograph defense

To: "Martin v. Löwis" <martin@v.loewis.de>, tedd <tedd@sperling.com>
Subject: Re: [idn] quick & dirty (but not too dirty) homograph defense
From: John C Klensin <klensin@jck.com>
Date: Sun, 20 Feb 2005 21:17:16 -0500
Cc: idn@ops.ietf.org, Erik van der Poel <erik@vanderpoel.org>
In-reply-to: <42191083.7040601@v.loewis.de>
References: <200502201529.j1KFTKdU091474@bartok.nlnetlabs.nl> <4218BC15.8060208@vanderpoel.org> <p06210205be3e762b5c0a@[192.168.0.101]> <42191083.7040601@v.loewis.de>

--On Sunday, 20 February, 2005 23:34 +0100 "\"Martin v. Löwis\""
<martin@v.loewis.de> wrote:

> tedd wrote:
>> As I understand it, this would only require a change in
>> mapping. That  should solve at least one "glyph look-alike"
>> problem -- shouldn't it?
> 
> Anything involving changes in the way IDNA works (through
> changing
> mappings, or restricting characters further on the client
> side) is
> not a solution, whatever the problem. Any such change will
> take ten
> years or more to implement.

Martin, 

For a change, I don't share the pessimism on this.  I think we
should be very hesitant to make changes unless it is pretty
clearly demonstrated that they are necessary, with a high
threshold on such a demonstration.  But it seems to me that:

(i) A change that would largely impact what can be registered
needs to be reflected and implemented only in 250-odd
registries.  The registry operators are mostly on their toes,
communicate with each other, and many of them are pretty early
in their implementation of IDNs and conservative about what they
are permitting.  Getting them to make changes is an entirely
different sort of problem than, e.g., trying to change
already-installed browsers or client plugins or getting people
to upgrade them.

(ii) The main things I've seen in observing and working with
registries that I didn't understand well enough a couple of
years ago to argue forcefully are things that we might be able
to change because the impact of whether someone was running an
old or new version would not be large.  For example, IDNA makes
some mappings that are dubious, not in the technical sense of
whether the characters are equivalent, but in the human factors
sense of whether treating them as equivalent leads to bad
habits.  To take a handy example from a Roman ("Latin")-based
script, I now suspect that permitting all of those font-variant
"mathematical" characters to map onto their lower-case ASCII
equivalents is a bad idea, just because it encourages users to
assume that, if something looks like a particular base
character, it is that character.  That, in turn, increases the
perceptual window for these phishing attacks.  If, instead, we
had simply banned those characters, creating an error if someone
tried to use one rather than a quiet mapping into something
else, we might have been better off.  So I now think we should
have banned them when IDNA and nameprep were defined and think I
could have made that case very strongly had I understood the
issues the way I do now.   Is it worth making that change today?
I don't know.  But I suggest that it would be possible to make
it for two reasons: (a) such a change would not change the
number of strings or characters that can be registered at all:
only the base characters can actually appear in an IDNA string
post the ToUnicode(ToASCII(char)) operation pair and (b) if I
were a browser or other application producer, I'd be seriously
considering warnings if any characters from those blocks
appeared... something IDNA certainly does not prohibit.  Changes
that increased the number of registerable characters are
problematic, but not that problematic if they don't pick up a
character that now maps and make it "real" (which is the problem
with deciding that upper case Omega is a good idea).  Reducing
the number of characters that can be registered --making a
now-valid base character invalid-- would be a much harder
problem.

(iii) Finally, and most important, there are several areas in
the world for which IDNA, as now defined, is equivalent to no
progress on IDNs at all because they need characters that are
not in Unicode 3.2.  To accommodate them, we are going to need
to upgrade to at least 4.0 and, in some cases, to sets of
characters not yet fully defined.  At some point, our upgrading
the version of Unicode is going to be an absolute necessity.
That requires changing the nameprep/stringprep tables and, no
matter how painful we think that might be, we are going to need
to figure out how to do it.

---------------

Rather than start another note, an observation about the
blacklist idea that doesn't seem to have been mentioned...  Let
me state this in personal terms:  I don't like blacklists.  The
reasons aren't technical at least in the first instance, but a
whole series of system behavior observations.  For example:

(1) It tends, in practice, to be much easier to get on them than
off of them.  One can certainly say "put .NET and .COM on the
list, and, when Verisign changes its behavior, we will take them
off".  But that hides a whole set of issues, such as whether you
believe them when they say their behavior has changed, whether
the behavior has changed enough (unfortunately, this is really
not a binary situation), and so on.

(2) There is a protocol issue with how the blacklist is
maintained.  If it is part of the code of the relevant
applications software, or a plug-in or add-on to it, we have to
face the problem that the average end user doesn't upgrade
things on the schedules we would like, resulting in a different
"harder to get off than on" case.  If the application is
designed to automatically update itself from a common source on
the network, we are suddenly faced with all of the problems
caused by various people's ideas of security: authentication of
the new tables, firewalls and system configurations that are
hostile to that sort of upgrading, and so on.

(3) As we have seen with spam, the existence of a blacklist
mechanism, no matter how well conceived, is an invitation to
abuse and/or claims that abuse is occurring.  If we create a
mechanism for putting up warnings about IDN use in particular
domains, how long will it be before there are demands to put up
warnings for domains that are, statistically, too friendly to
spammers, pornographers, phishers, music-swappers, or holders of
unpopular social or political positions?  We all understand the
difference between those issues and ones where the boundary can
be set using objective technical criteria.  But we have had long
experience with governments and policy advocates not
understanding the distinction and, indeed, being determined to
not do so.

(4) Finally, the result of too many warnings is that warnings
are ignored.  Certainly it is better to flag or color IDNs from
a few domains than it is for all domains.  And it is almost
certainly better to do something than nothing.  But, if the
domains flagged contain a large fraction of all of the
second-level registrations in the world, we should keep our
expectations of how much such measures will help very moderate.

    john

Follow-Ups:
- Re: [idn] quick & dirty (but not too dirty) homograph defense
  - From: Erik van der Poel <erik@vanderpoel.org>
- Re: [idn] quick & dirty (but not too dirty) homograph defense
  - From: "JFC (Jefsey) Morfin" <jefsey@jefsey.com>

References:
- Re: [idn] quick & dirty (but not too dirty) homograph defense
  - From: Jaap Akkerhuis <jaap@NLnetLabs.nl>
- Re: [idn] quick & dirty (but not too dirty) homograph defense
  - From: Erik van der Poel <erik@vanderpoel.org>
- Re: [idn] quick & dirty (but not too dirty) homograph defense
  - From: tedd <tedd@sperling.com>
- Re: [idn] quick & dirty (but not too dirty) homograph defense
  - From: "Martin v. Löwis" <martin@v.loewis.de>

Prev by Date: Re: [idn] upstream and downstream
Next by Date: [idn] another homograph attach: BIDI char
Previous by thread: Re: [idn] quick & dirty (but not too dirty) homograph defense
Next by thread: Re: [idn] quick & dirty (but not too dirty) homograph defense
Index(es):
- Date
- Thread