[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] quick & dirty (but not too dirty) homograph defense




--On Monday, 21 February, 2005 16:13 -0800 Erik van der Poel
<erik@vanderpoel.org> wrote:

> John,
> 
> I'm probably missing something, but if the apps are not
> currently warning the user when characters from blocks that
> ought to be banned appear, then people and/or tools may be
> generating references (e.g. HTML documents) to those
> characters, without realizing that those characters are being
> mapped to some other base characters, which then work OK in
> the DNS lookup.

You are, of course, correct.  While warning others to think
about things from the user and content-generator side of things
rather than the registrar/registry side, I made a similar
mistake that probably leads to an incorrect conclusion.  My
assumption was that those who registered names and constructed
links would end up with the domain names in those links that
were equivalent of
ToUnicode(ToASCII(whatever-was-submitted-for-registration),
i.e., base characters only.  That would certainly be the result
of a number of the processes for obtaining those links that
crossed my mind.  But you are quite right, nothing requires it
and references could be constructed using non-base characters.
For languages with clear case distinctions, one would actually
predict that it would occur: just as the owner of
coolhotjunk.xyz might prefer to have it in the DNS and
references as CoolHotJunk.xyz, I'd expect funny mixed case
strings in IDNs which, under IDNA, would of course go into the
DNS as the punycode encoding of the lowercase-only form.

> Maybe it is unlikely that a lot of such references would come
> to exist, and it wouldn't be such a burden to the user of a
> new app to see the occasional error e.g. when they click on
> such a link.
> 
> But how do you determine how many HTML documents contain bad
> characters in their links, and how do you decide that that
> number is low enough to make such a change to the spec?
> 
> So I'm wondering if you would really be able to state that
> such a change "would largely impact what can be registered".
> How do you know that such a change does not also impact the
> users of new clients accessing existing documents?
> 
> Oh wait, I know! Just get Google to do a survey in their cache?

:-(

Mumble.  What a mess.
    john