[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] quick & dirty (but not too dirty) homograph defense



John,

I'm probably missing something, but if the apps are not currently warning the user when characters from blocks that ought to be banned appear, then people and/or tools may be generating references (e.g. HTML documents) to those characters, without realizing that those characters are being mapped to some other base characters, which then work OK in the DNS lookup.

Maybe it is unlikely that a lot of such references would come to exist, and it wouldn't be such a burden to the user of a new app to see the occasional error e.g. when they click on such a link.

But how do you determine how many HTML documents contain bad characters in their links, and how do you decide that that number is low enough to make such a change to the spec?

So I'm wondering if you would really be able to state that such a change "would largely impact what can be registered". How do you know that such a change does not also impact the users of new clients accessing existing documents?

Oh wait, I know! Just get Google to do a survey in their cache?

Erik

John C Klensin wrote:
(i) A change that would largely impact what can be registered
needs to be reflected and implemented only in 250-odd
registries.  The registry operators are mostly on their toes,
communicate with each other, and many of them are pretty early
in their implementation of IDNs and conservative about what they
are permitting.  Getting them to make changes is an entirely
different sort of problem than, e.g., trying to change
already-installed browsers or client plugins or getting people
to upgrade them.

(ii) The main things I've seen in observing and working with
registries that I didn't understand well enough a couple of
years ago to argue forcefully are things that we might be able
to change because the impact of whether someone was running an
old or new version would not be large.  For example, IDNA makes
some mappings that are dubious, not in the technical sense of
whether the characters are equivalent, but in the human factors
sense of whether treating them as equivalent leads to bad
habits.  To take a handy example from a Roman ("Latin")-based
script, I now suspect that permitting all of those font-variant
"mathematical" characters to map onto their lower-case ASCII
equivalents is a bad idea, just because it encourages users to
assume that, if something looks like a particular base
character, it is that character.  That, in turn, increases the
perceptual window for these phishing attacks.  If, instead, we
had simply banned those characters, creating an error if someone
tried to use one rather than a quiet mapping into something
else, we might have been better off.  So I now think we should
have banned them when IDNA and nameprep were defined and think I
could have made that case very strongly had I understood the
issues the way I do now.   Is it worth making that change today?
I don't know.  But I suggest that it would be possible to make
it for two reasons: (a) such a change would not change the
number of strings or characters that can be registered at all:
only the base characters can actually appear in an IDNA string
post the ToUnicode(ToASCII(char)) operation pair and (b) if I
were a browser or other application producer, I'd be seriously
considering warnings if any characters from those blocks
appeared... something IDNA certainly does not prohibit.  Changes
that increased the number of registerable characters are
problematic, but not that problematic if they don't pick up a
character that now maps and make it "real" (which is the problem
with deciding that upper case Omega is a good idea).  Reducing
the number of characters that can be registered --making a
now-valid base character invalid-- would be a much harder
problem.