[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: An argument against multiple character sets



OK, Patrick, now I see your issues.

In the cases you've cited below (no pun intended <smile>), I'll propose the
following:


At 12:18 AM 1/27/00 +0100, Patrik Fältström wrote:

<snip>

>the encoding itself, like UNICODE in UTF-8, include ambiguities
>like the fact that
>
>'Ä' is one position in the Unicode tables.
>'ä' is a different position in the tables.

In this case, why not continue the current practice of downsizing to
lowercase characters only? So then,where '=C4' is one position in the
UNICODE tables and
'=E4' is a different position in the tables (forming the uppercase and
lowercase versions of "a-double-dot"), it would make sense to case-fold them. 

>'A' followed by "Combination 'M'" is a third.
>'a' followed by "Combination 'M'" is a fourth.

OK, so to be consistent, these composite characters should be down-cased as
well, following the UNICODE canonical transformation...

>Should they be treated as equal or not?

Yes, of course. Then you can perform comparisons and lookups only on fully
canonicalized (decomposed and downcased) versions.

<snip>

>Today, 'A' and 'a' are treated equal in DNS, so one can not register the
>domainname "Example.com" if "example.com" is already registered, even
>though different bytes are encoded in the labels. Should "äxample.com" be
>different from "Äxample.com" and in turn be different from "aMxample.com"
>and "AMxample.com"?
>

No - we should maintain the consistency of the current matching system in
the DNS - with downcasing all around. So these should all be treated as the
same.

Bill Semich
.NU Domain