[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] punctuation
Adam M. Costello wrote:
> I imagine you'd want all the characters that could immediately follow
> the host name in a URI, so add "?" and "#" to that list.
>
> But how well do average users know URI syntax anyway? What would they
> think of:
>
> http://foo.com&bar.baz.xx
> http://foo.com~bar.baz.xx
> http://foo.com|bar.baz.xx
>
> Maybe we either need to ban all punctuation (as in my proposal about
> internationalized host names), or always make the boundaries of the
> domain name apparent to the user (using color or highlighting or
> underlining or something).
I started to write down all the delimiters that could appear in DNS,
URIs and email, and then I realized that this problem is not just about
the homographs of the *legal* delimiters used in these contexts. No, it
is about whatever *looks like* a legal delimiter to the average user,
because the phishers don't have to stick to the (homographs of the)
legal delimiters. Then I went back in the archives, and of course, Adam
has already pointed this out.
The implications of this are actually quite profound. Since there are so
many characters in Unicode, and since many of those are unfamiliar to
the average user, a lot of those might look like punctuation.
As Adam also points out in another email, it's too bad that domain names
are usually displayed in "little-endian" order. If they were displayed
in the opposite (big-endian) order, the 3rd example above would become:
http://xx.baz.com|bar.foo
Notice how the "com" and "foo" are now separated. The "real" (unspoofed)
URI would look like this:
http://com.foo
If users were actually used to seeing it this way, they might notice the
spoof above more easily. But they aren't used to seeing it this way, and
it would be pretty difficult to change this convention now. It's too late.
Back to punctuation: Banning all punctuation would not be enough. We
would have to ban anything that might look like punctuation to the user.
That would mean banning a huge swath of Unicode, which is probably not
in the best interests of various communities around the world. Besides,
different people will have different ideas about what looks like
punctuation. So it might be hard to decide which huge swath of Unicode
to ban.
So maybe it's better to consider Adam's alternative idea: make the
boundaries of the domain name apparent (using color or whatever). Over
time, the users will get used to seeing domain names this way, and then
they will be able to spot domain name spoofs more easily too.
But even if we were to color the whole domain name:
foo.com|bar.baz.xx
The user might still think that this site is somehow related to foo.com
and therefore safe (as was also pointed out). So you'd have to display
the "unusual" characters like '|' differently. Or something. Sigh. Seems
hopeless.
Are the phishers going to have a field day with IDN, or what?
But is this problem really limited to IDN? What about the following
legal ASCII DNS name:
foo.com--secure-user-services-and-products.tech-mecca.biz
Does this mean that we should try to switch left-to-right readers (most
of the world) over to big-endian domain names? Please tell me I'm
overreacting!
Erik