[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character equivalence mapping (was: Re: [idn] SLC minutes)



Other scripts do have upper/lowercase correspondences, just like the Latin
script does. Users of those scripts are just as likely to want caseless
matching as users of the Latin script (such as you).

For more information, see http://www.unicode.org/unicode/reports/tr21/.

Mark
—————

Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

----- Original Message -----
From: "tedd" <tedd@sperling.com>
To: <idn@ops.ietf.org>
Sent: Thursday, January 03, 2002 09:35
Subject: Re: Character equivalence mapping (was: Re: [idn] SLC minutes)


Mark, john, Edmon:

>1. This issue was debated at length some time ago. I suggest that the
people
>arguing for visual confusability as a criterion for matching look at that
>discussion in detail before proceding.

I'm not arguing (in this debate) the "look-a-like" position. In other
words, it makes no difference to me if certain glyph's look identical
in numerous char sets. I am arguing the opposite position -- the
characters in my example don't look a like.

I am arguing the point that the decision "has been made" to map upper
case Greek letters to lower case letters. For proof, look at the
current version of nameprep ( http://www.imc.org/nameprep/  ) and try
running code point 2126 (upper case omega) through it. You will find
that it IS mapped to code point 03A9 (lower case omega).

My question is "Why?" What's the foundation for this determination?
For what good reason is there to conclude that the upper case Omega
should be mapped to a lower case omega?

I see no "A.com" to  "a.com" argument/problem here. Clearly, if
someone registered ?.com and someone else registered w.com there is
significant difference in identification between the two names. Those
two domain names can be completely unique domain names with no
significant resultant problems. Whereas, in the Latin char set, I can
see the reason for making W.com and w.com identical (i.e., mapping W
to w) because there is an UC/LC consideration/distinction in the
language. But, that's not a problem in the Greek char set -- is it...
really?

>(i) From observation, when scripts have two cases, the
>upper-case form is more likely to be highly stylized, and hence
>differentiated from characters in other scripts, than the
>lower-case one.  Hence, if one is going to adopt
>stylization-based (glyph-distinction, if you prefer)
>canonicalization rules, one is better off treating upper case as
>the normal form, rather than lower case.

It looks to me as if someone has already made the determination to
map other languages based upon the Latin char set UC/LC problem
without concern that other languages may not have the UC/LC
distinction and thus be absent of the UC/LC problem. I think the
Greek example I gave above sufficiently demonstrates my observation.

tedd

--
http://sperling.com