[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] punctuation



Adam M. Costello wrote:
> I imagine you'd want all the characters that could immediately follow
> the host name in a URI, so add "?" and "#" to that list.
>
> But how well do average users know URI syntax anyway?  What would they
> think of:
>
> http://foo.com&bar.baz.xx
> http://foo.com~bar.baz.xx
> http://foo.com|bar.baz.xx
>
> Maybe we either need to ban all punctuation (as in my proposal about
> internationalized host names), or always make the boundaries of the
> domain name apparent to the user (using color or highlighting or
> underlining or something).

I started to write down all the delimiters that could appear in DNS, URIs and email, and then I realized that this problem is not just about the homographs of the *legal* delimiters used in these contexts. No, it is about whatever *looks like* a legal delimiter to the average user, because the phishers don't have to stick to the (homographs of the) legal delimiters. Then I went back in the archives, and of course, Adam has already pointed this out.

The implications of this are actually quite profound. Since there are so many characters in Unicode, and since many of those are unfamiliar to the average user, a lot of those might look like punctuation.

As Adam also points out in another email, it's too bad that domain names are usually displayed in "little-endian" order. If they were displayed in the opposite (big-endian) order, the 3rd example above would become:

http://xx.baz.com|bar.foo

Notice how the "com" and "foo" are now separated. The "real" (unspoofed) URI would look like this:

http://com.foo

If users were actually used to seeing it this way, they might notice the spoof above more easily. But they aren't used to seeing it this way, and it would be pretty difficult to change this convention now. It's too late.

Back to punctuation: Banning all punctuation would not be enough. We would have to ban anything that might look like punctuation to the user. That would mean banning a huge swath of Unicode, which is probably not in the best interests of various communities around the world. Besides, different people will have different ideas about what looks like punctuation. So it might be hard to decide which huge swath of Unicode to ban.

So maybe it's better to consider Adam's alternative idea: make the boundaries of the domain name apparent (using color or whatever). Over time, the users will get used to seeing domain names this way, and then they will be able to spot domain name spoofs more easily too.

But even if we were to color the whole domain name:

foo.com|bar.baz.xx

The user might still think that this site is somehow related to foo.com and therefore safe (as was also pointed out). So you'd have to display the "unusual" characters like '|' differently. Or something. Sigh. Seems hopeless.

Are the phishers going to have a field day with IDN, or what?

But is this problem really limited to IDN? What about the following legal ASCII DNS name:

foo.com--secure-user-services-and-products.tech-mecca.biz

Does this mean that we should try to switch left-to-right readers (most of the world) over to big-endian domain names? Please tell me I'm overreacting!

Erik