[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] who should be doing IDN filtering
--On Thursday, 17 February, 2005 09:58 +0000 "Adam M. Costello"
<idn.amc+0@nicemice.net.RemoveThisWord> wrote:
>...
>> ...assuming we can make the language tag available via some
>> dns tricks or some API...
>
> I don't see that happening. The IDN working group decided
> quite deliberately that domain names would not contain any
> meta-info like language tags; they're just text strings.
Concur. I also concur with the several folks who have pointed
out that, at browser time, language information won't do a whole
lot of good, at least without access to every per-domain table
for the characters that are assumed to be bound to a given
browser.
> Still, I expect that some not-terribly-complex heuristics,
> based only on the bare character strings, could go a long way
> toward exposing suspicious domain names.
I used to be convinced of this, but have become increasingly
skeptical about how far it would take us. The easiest tests in
principle are for for homogeneity of characters within a label
(the "one label, one script" test, more or less). Those tests
are indeed fairly simple if the label contains characters that
form a contiguous block in Unicode that conforms, more or less,
to one script. That requires a somewhat fuzzy definition of
"script", which is probably ok, but also isn't a hugely good
test once one gets out of the low-end of the BMP and the scripts
taken over, as blocks, from prior standards. After that, the
tests get more complex, to the point that one can imagine
needing all of the Unicode script tables (assuming they are
adequate, which they probably are for this purpose) within the
application to make a good test. A test based on those tables
wouldn't be terribly complex computationally, but the notion of
carrying those tables around in resource-limited devices, or
even in a browser whose footprint one was trying to minimize,
gets a little dicey.
But the important question, I think, is what attacks that would
protect us against. Certainly, it would provide protection
against a name accidentally registered with an odd mix of
characters. But I suggest that is a null set -- if a label is
entered into the DNS with a heterogeneous collection of
characters,
it is because someone decided to do it and the registrar and
registry decided to permit them to do it. That isn't an
accident, that is a deliberate set of decisions, for whatever
reasons. As the bad guys get more sophisticated, there are
going to be attacks that will be far harder to detect than
either the paypal or yah00 examples that have shown up on this
list, many of them probably involving mixtures of scripts
neither of which is Roman-based.
So, should the browsers (or other UI programs) take whatever
precautions and issue whatever warnings are reasonably feasible?
Sure. Should we assume that will help very much against a
determined and sophisticated attacker? Nope. Can a lot more be
done at registration time than at lookup (or user inspection)
time? Yes, certainly -- not only is the code base easier to
control and the "language" information available, as has been
pointed out, but taking some time on _those_ servers to look up
script tables, compare them to labels, and even apply variant or
other cross-checking rules if the domain considers that
appropriate would be perfectly rational.
I think we will find ourselves, a few years down the line, in a
situation in which users discover that names in some domains, by
virtue of tighter registration policies, will be safer to use
than others. If that results in competitive pressures on the
more relaxed ones, so much the better. But, in the general
case, it may be about as much as can be done.
john