[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Matching and comparison



At 05:47 PM 1/20/00 +0900, Martin J. Duerst wrote:
> > Unless we can show a need for case-insensitivity *in the
> > internationalized characters*, we shouldn't force it.
>
>The largest need, already discussed, is clearly that a lot of people
>don't want to have to register ibm/ibM/iBm/iBM/Ibm/IbM/IBm/IBM to
>make sure nobody else registers. And three-letter companies still
>have an easy job.

That will always be a problem, regardless of what we do with case 
sensitivity. Using the same logic, he Dürst company would not only have to 
register Dürst.com, it would have to register Dûrst.com, Dúrst.com, 
Dùrst.com, Dûrst.com, and Dùrst.com, not to mention about a dozen more that 
my Eudora MUA didn't want to type for me. And this is just the European 
scripts; I think that Indic and Arabaic scripts would have very similar 
problems.

We shouldn't pretend to fix the "too many similar names" problem by only 
talking about capitalization.

>Telling people that in an URI, domain names are case-insensitive,
>but file names are/may be case-sensitive is already hard. Telling
>them that a name is case-insensitive it if is ASCII only, and case-
>sensitive otherwise would be a really hard job.

Indeed. Telling them about anything having to do with internationalization 
will be.

>I think we can postpone the casing issue if we agree that there
>are no requirements in that area, i.e. if we think that we
>can live with any solution (case-folding or not). But that's
>not what you are saying, and that's not what I'm saying,
>so I suggest that we put the points we came up with
>(would like to be able to have the names in the appropriate
>casing, would prefer not to have a strange break between
>names containing only ASCII and others, would like to avoid
>exponentially growing registrations to cover equivalents).

I think it would be good for us to list some of the known trickiness of 
similar-looking script issues. So far, we have casing and Latin vowels with 
diacritics. Looking through my I believe we also have to list Latin 
consonants with diacritics, bidirectional names, similar-looking 
punctuation marks, Arabic joiners, Devangari dependant and independant 
vowels (and conjunct formations, and half-forms...), and Tamil vowel 
splitting. I probably missed about a dozen other tricky issues; Martin is 
much more versed in these things than I am.

--Paul Hoffman, Director
--Internet Mail Consortium