[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] upstream and downstream
Erik van der Poel <erik at vanderpoel dot org> wrote:
> Let's take an analogy. The P.O. Box system. Right now, it uses numbers
> like P.O. Box 3256. What would happen if the Postal Service decided to
> use Unicode, where some of the characters are only slightly different,
> and the postman inadvertently put some important mail into the wrong
> box, one that was registered by an evil person, using a name that was
> only slightly different from the PO box of some company?
>
> Wouldn't that company try to get the Postal Service to use a smaller
> set of symbols (say, digits) rather than this confusing Unicode? Maybe
> that company would even try to sue the Postal Service.
Sadly, the Postal Service is fully capable of putting mail in the wrong
box without the help of Unicode.
Sorry. Anyway...
The Postal Service probably would have instituted some type of subset of
valid characters for use in P.O. box identifiers. (Or alternatively,
they would have a blacklist of invalid characters.) At some point they
would discover, or have pointed out to them, confusables that they
hadn't thought of. They would probably then amend their list to exclude
the newly discovered confusable.
THEN, because they are the Postal Service and have complete authority
over the entire system, they would be able to discontinue the use of the
evil P.O. box and require the user to change its name, or change it for
him, or refund his deposit.
Meanwhile, the reason they switched to Unicode in the first place was
precisely that they wanted to offer a wider variety of characters to
their users, for whatever reason. Isn't that why domain names were
internationalized? Isn't that the reason *anyone* switches to Unicode?
They must have had some benefits in mind by adopting a larger
repertoire. By switching away from "this confusing Unicode" they are
giving up those perceived benefits.
If you have an application that lends itself to a limited repertoire,
like automobile license plates or P.O. box numbers or house numbers, use
that limited repertoire. If you need a wider repertoire, you can use
Unicode and you can *still* implement a subset. The problem, as always,
is in determining what the subset should be.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/