[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: Unicode and Security



In a message dated 2002-02-10 13:00:19 Pacific Standard Time, 
elharo@metalab.unc.edu writes:

> However, I do continue to maintain that character confusion is a real 
> security risk that will have real impact on users, and that needs to 
> be considered in any system that uses Unicode.

We have already established that similar-looking characters can cause 
confusion in Unicode-based systems.  However, we have also established that 
ISO 8859-1, 8859-5 (Cyrillic), 8859-7 (Greek), and even ASCII can suffer from 
this same problem.  It is unrealistic to sugest that the problem began with 
Unicode.

> In some domains the 
> problem might be severe enough to eliminate Unicode from 
> consideration in favor of less extensive character sets like Latin-1. 
> That would be a shame, but until the Unicode consortium addresses at 
> a root level the real security implications of their work, security 
> conscious developers will look elsewhere. (I notice the Unicode 3.0 
> book does not even have the word "security" in its index.) Many more 
> developers who are at best tangentially conscious of security issues 
> will go ahead and develop insecure systems because they don't realize 
> the security implications of adopting Unicode.

Companies and individuals that choose to throw out the baby with the bath 
water will achieve the kind of results that that approach usually delivers.

Companies and individuals that wish to establish their own definitions of, 
and policies for dealing with, confusable characters are free to do so.  As I 
stated earlier, and nobody could refute, there is no consistent way to 
determine which sets of characters are confusable with each other, other than 
in the most obvious cases like o/omicron.  So of course neither the Unicode 
Consortium nor WG2 has taken it upon themselves to draw up such a list.  This 
must be a local decision.

> Another possibility is a super-normalization that does combine 
> similar looking Unicode characters; e.g. in the domain name system we 
> might decide that microsoft.com with Latin o's or Cyrillic o's or 
> Greek o's is to resolve to the same address. No separate registration 
> would be necessary or possible. This would require detailed analysis 
> of the tens of thousands of Unicode characters allowed in domain 
> names by fluent speakers of various languages; not easy, not cheap, 
> but perhaps necessary. Besides, the security improvements, this 
> proposal would also improve the system's usability. Aren't sure 
> whether that URL on the bus used an o or an omicron? Doesn't matter, 
> type either one.

Adding this sort of unification to the nameprep stage might have been 
possible about a year or so ago.  It's probably too late now.

> Actually, people have been talking about the security problems with 
> HTML for years. Search engines have gone to some effort to eliminate 
> spamdexers that use these techniques. The log in HTML's eye does not, 
> however, negate the existence of the log in Unicode's eye.

Again (and again), the problem is not unique to Unicode.  Existing character 
sets also contain confusables.  Blaming Unicode for exacerbating the problem 
by offering so many characters is like blaming your local ice cream shop for 
offering 31 flavors, because that makes it so much more difficult to choose.

-Doug Ewell
 Fullerton, California
 (address will soon change to dewell at adelphia dot net)