[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: An argument against multiple character sets



At 10:08 PM 1/23/00 +0100, Harald Tveit Alvestrand wrote:
>Note: This is the UTF-16 (or UCS-2) representation of Unicode.

UTF-16BE, to be exact. Kinda near and dear to my heart right now.

>Your argument indicates that adding character sets to a list after initial 
>implementation is impossible.

That's one argument, yes, but not the only one.

>  It doesn't mean that the initial set needs to be just one, although a 
> server has to be able to compare strings between all the initial 
> character sets - which is clearly a bit simpler if there is just one of them.

I don't think that is even enough. Without labelling the query from the 
user to the resolver with the character set and encoding, how would the 
resolver know whether a request with 0x46F9 was LATIN SMALL LETTER F 
followed by LATIN SMALL LETTER U WITH OGONEK (8859-4) or LATIN SMALL LETTER 
F followed by HEBREW LETTER SHIN (8859-8)?

>However, I think the *requirement* you are trying to state is that when a 
>domain name is represented as text on paper, the user who thinks he has 
>access to suitable input devices for that text should be able to query on 
>that string and have returned information about the domain that the text 
>on paper was intended to represent.

In the absence of a single character set and encoding, yes. It also puts 
much more load on the resolver, which now needs to be able to translate 
from every encoding that might come from a user to every encoding that 
might be used in the domain name.

--Paul Hoffman, Director
--Internet Mail Consortium