[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Matching and comparison



At 09:36 00/01/18 -0800, Paul Hoffman / IMC wrote:
> At 05:26 PM 1/18/00 +0900, Martin J. Duerst wrote:
> >At 16:22 00/01/17 +0800, James Seng wrote:
> > > Paul Hoffman / IMC wrote:
> > > > I think you are trying to mandate case-insensitivity here. That's good
> > > > in theory and bad in practice for international characters. There are
> > > > examples of letters whose case conversion are different for different
> > > > written languages. If we want to require case-insensitivity, we
> > > > have to point to a single conversion table for all characters.
> > >
> > > Preciesly why we need to discuss this further. I am sure there are 
> > people who
> > > has different view on case sensitivity. It can get very religous.
> >
> >There is no need to have a single conversion table if this is
> >handled e.g. at the client. The Microsoft I-D proposed that.
> 
> Maybe I am misunderstanding your intention here, but this seems like a 
> recipe for breaking the uniqueness rule. If my client coverts case 
> differently than your client does, and we both try to resolve a domain 
> name, the resolver will get two different queries and thus will give two 
> different answers.

So let's look at an example (for casing, there is not that many :-):

Turkish has a dotted i and a dotless I. For Turkish, converting
a I into an i is like converting an o to an u for English.
Most probably, we cannot change that DNS treats i and I the
same already. But we want iDNS not to do that for Turkish.
It is okay (well, even desired) to treat the two dotted i's
('i' and its uppercase) and the two dottless I's (I and its
lowercase) the same. Obviously, the server doesn't know what
language the user speaks, and the server could know that a
label is Turkish, but maybe we don't want to go there, so
the best think to do, among many non-perfect alternatives,
may be to say, as the Microsoft I-D did, that the client/
resolver lowercases things. On a Turkish system, I would be
lowercased to a dotless i, and an I with a dot would be
lowercased to i. It is true that this distroys the uniqueness
of mappings between e.g. print and machines, but the alternatives
I have seen are all worse (throwing dottless and dotted i
into one pot for the Turkish, and so on).

Of course, if you know a better alternative, I'd be glad to
know.


Regards,   Martin.


#-#-#  Martin J. Du"rst, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org