[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Matching and comparison





Paul Hoffman / IMC wrote:
> 
> At 05:47 PM 1/20/00 +0900, Martin J. Duerst wrote:
> > > Unless we can show a need for case-insensitivity *in the
> > > internationalized characters*, we shouldn't force it.
> >
> >The largest need, already discussed, is clearly that a lot of people
> >don't want to have to register ibm/ibM/iBm/iBM/Ibm/IbM/IBm/IBM to
> >make sure nobody else registers. And three-letter companies still
> >have an easy job.
> 
> That will always be a problem, regardless of what we do with case
> sensitivity. Using the same logic, he Dürst company would not only have to
> register Dürst.com, it would have to register Dûrst.com, Dúrst.com,
> Dùrst.com, Dûrst.com, and Dùrst.com, not to mention about a dozen more that
> my Eudora MUA didn't want to type for me. And this is just the European
> scripts; I think that Indic and Arabaic scripts would have very similar
> problems.

Well I think that is to strong, but you can make a more realist example by
doing Du\:rst and Durst (say for dutch customers where the german u\: == u
phonetically and the u == ue). Or much the same for things like the #,
AE and 
ae in the scandinavian languages or the ij, dz, ts or nj in eastern
europe; 
which  is a harmless ligature in one language (and easily replaced by
its two 
components and/or have it's case folded) whilst in the other language 
changes essential meaning when folded or replaced by the two visually similar
singe glyph component. 

But then again; this is largely an entry/encoding issue; the Unicode spec
hapily normalized them in a lot of cases in something which is visually
the same.

Dw.