[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: An argument against multiple character sets



--On 2000-01-23 16.44 -0500, "J. William Semich" <bill@mail.nic.nu> wrote:

> UTF-8 looks pretty "standard"
> already, from the client/user point of view, at least.

I want us to keep things apart for a while. Choosing UNICODE/10646 as the
charcter set is one thing, and on this level we have to work with the issue
with matching rules, canonilization, case sensitivity etc.

On a different level we have to talk about how to encode the character set,
and especially with UNICODE I have seen several encoding schemes, even on
several layers by itself.

Then, we have to stuff this information into DNS packets.


I have not really seen this discussion agree on even the first layer here,
i.e. requirements on multiple character sets and the issue with
canonlization which is even more troublesome questions than handling of
casing -- but the problems are similar.


What I am nervous about is the previous discussions whether it is a
requirement to define issues like case sensitivity, canonilization etc, and
my answer is definitely yes. This because of the impact on registration
issues, because domain names have to be "unique" in a zone at time of
registration. Because of that, the rules for equality have to be known at
the time of registration, and IF the rules changes, it might have
"interesting" impact on already registered domainnames (as we have seen in
the previous examples of "ibM.com" equal or not to "IBM.com", extrapolated
into equality of "ä" with "a" followed by combining "M").


Is there some consensus even on what the requirement is?

   paf