[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] upstream and downstream

To: IETF idn working group <idn@ops.ietf.org>
Subject: Re: [idn] upstream and downstream
From: Erik van der Poel <erik@vanderpoel.org>
Date: Sat, 19 Feb 2005 16:07:54 -0800
In-reply-to: <20050219214029.GA5457~@nicemice.net>
References: <421746D7.4070102@vanderpoel.org> <42176970.3050704@vanderpoel.org> <20050219214029.GA5457~@nicemice.net>
User-agent: Mozilla Thunderbird 1.0 (X11/20041206)

Adam M. Costello wrote:

Display time is independent of lookup time.

Good point. I did consider that, but I deliberately left it out of my "big picture". Perhaps I should have included it...

Also, names that get transported
in ACE form and converted back to Unicode for display would probably
take on a ransom-note appearance when single-script strings get
nameprepped into mixed-script strings.

Over the years, the fonts seem to have grown, to include more and more of Unicode. A long time ago, in X Windows, it would definitely have looked like a ransom note. But these days, in Windows and even X Windows with larger fonts, it looks a lot better. So good, in fact, that it's hard to tell the difference between the homographs. (I was the chief architect of the multilingual font engines for the Windows and Unix versions of Mozilla. This is one area I'm really familiar with.)

Re: nameprep spec change:

I don't really know whether this kind of change is realistic.
I think not.


Would you care to elaborate?

You might as well ask why they should see domain names at all.
Maybe there's a way to abstract domain names out of the users' view
altogether, but until then, if users are going to see domain names, they
want to be able to tell whether the name they see is the name they think
they see.

But then don't you think it's at least a little unfortunate that (a) Unicode chose to include duplicates and/or (b) nameprep chose to use Unicode? Previously, users only had to squint at ASCII 1 vs l. Now, with IDN, it has been taken to a whole different level. They have to endure colored letters that look identical but have been colored to indicate that their character codes are different. Most users don't even understand what character codes are. So I still feel that it's unfortunate that we now end up having to confuse users.

Of course, it's not clear that there will be very many of these phishing attempts, in which case the user will almost never see these colors. We may be blowing this issue out of proportion. Real phishers may find other, more devious ways to fool the user.

Can't we solve the problem upstream?


We don't know how many misleading names have already been registered
under .com and .net, so I don't see how we can completely solve the
problem upstream.

Actually, I've been thinking that one could write a program that determines which pairs of Unicode characters have been assigned the same glyph index in TrueType fonts on Windows. A registry could then check all the IDNs in their set to see which ones contain only ASCII homographs and ASCIIs. If there are only a few misleading names, they might be able to delete them by contacting their owners, etc. If there are many, well, I don't know.

But one thing these registries could do now or quite soon is to install some heuristic filters to try to stem the influx of bogosities. One can only hope...

Erik

Follow-Ups:
- Re: [idn] upstream and downstream
  - From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>

References:
- [idn] RRP and language tags
  - From: Erik van der Poel <erik@vanderpoel.org>
- [idn] upstream and downstream
  - From: Erik van der Poel <erik@vanderpoel.org>
- Re: [idn] upstream and downstream
  - From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>

Prev by Date: Re: [idn] IDN spoofing
Next by Date: Re: [idn] quick & dirty (but not too dirty) homograph defense
Previous by thread: Re: [idn] upstream and downstream
Next by thread: Re: [idn] upstream and downstream
Index(es):
- Date
- Thread