[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: Unicode and Security



In a message dated 2002-02-09 13:00:59 Pacific Standard Time, 
larsga@garshol.priv.no writes:

> It seems to me that this problem really needs some other fix than the
> merging of all similar-looking characters in all character sets. I
> just can't see that working. 

Even the "merging" part wouldn't work.  Let's say that I, like Ken Sakamura 
or Bernard Miller before me, have decided that I know much more about 
character encoding than the Unicode Consortium or WG2, and I am going to 
develop my own character encoding that will solve the problem of confusables 
once and for all.

OK, we start with the easy ones.  Latin A, Greek Alpha, and Cyrillic A all 
get unified.  Latin E, Greek Epsilon, Cyrillic E, unified.  Hey, this is 
easier than I thought.  Latin B, Greek Beta, Cyrillic Ve.  Ha!  I'm smart 
enough to know that Ve gets unified with B and Beta, even though it 
represents a different sound.  Just like Han unification!  Boy, those Unicode 
dolts really missed something there.

Let's keep going.  Latin Y, Greek Upsilon, Cyrillic U.  Wait a minute, that 
Cyrillic U doesn't look *quite* the same.  Oh well, it's close enough, right? 
 Let's try some lower-case letters.  Latin a, Greek alpha, Cyrillic a.  That 
Greek alpha looks kinda cursive, doesn't it?  Should we unify it or not.  
Hmmm...

How about Latin n and Greek eta?  Is that descender on the eta significant or 
not?  Hey, you could stick an eta in the middle of a Web address and really 
fool somebody.  Better unify.  How about Latin v and Greek nu?  Different 
glyphs or not?  In 9-point MS Sans Serif, they're pretty close, aren't they?  
(And don't forget Armenian vo!)  Same goes for Latin y and Greek gamma.

Well, you get the point.  The world of alphabetic confusables is just not 
that simple or that 1-to-1.  There are more edge cases, in fact, than obvious 
cases such as the a/alpha or o/omicron that we keep hearing about.  And if I 
were trying to design this hypothetical "Uniglyph" encoding to get rid of 
those pesky confusables, and still provide support for alphabetic scripts 
besides Latin, I would eventually have to face the fact that it *can't be 
done*.  Oh, sure, it can be done for a/alpha and o/omicron, so I can make a 
sales presentation or a picket sign.  But a complete technical solution, uh, 
no.

-Doug Ewell
 Fullerton, California
 (address will soon change to dewell at adelphia dot net)