[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

An argument against multiple character sets



There has been some discussion on this list about whether or not we should 
allow domain names to be created in different character sets. I believe 
that there is a simple argument that shows that we can't.

Let's say I want to register a domain name that is two letters: LATIN SMALL 
LETTER F followed by LATIN SMALL LETTER U WITH OGONEK. If I use ISO 8859-4, 
that would encoded as 0x46F9. So far so good. You see a billboard with my 
domain name on it, and you enter it into a browser. That browser uses a 
different character set, let's say Unicode. The browser sends to the 
resolver 0x00460173.

There are two problems here:
- The browser *can't* know every possible character set
- Even if it did, it wouldn't know which one to use

Adding a charset tag to the internationalized string in the domain name 
doesn't help. There is no way for someone seeing a printed representation 
of the internationalized string to know which character set was used; in 
this case it could be 8859-4 or Unicode or possibly other character sets 
that contain that character.

Even requiring all resolvers to do the conversion doesn't help unless we 
list all the possible character sets and never change the list. This 
introduces many problems:
- New character sets can't be added later without simultaneously updating 
all the resolvers on the Internet to use the added character sets. Such 
simultaneous updates are impossible.
- The main reason we are considering more than one character set now is 
current politics and desires for favored character sets. We can safely 
assume that politics and desires will continue to change and evolve.
- We are forcing resolvers to do much more processing than they are now.

In short, I don't see how a solution that allows more than one character 
set, or even more than one encoding, will work. If others have 
counter-examples, I'm open to hearing them.

--Paul Hoffman, Director
--Internet Mail Consortium