[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character equivalence mapping (was: Re: [idn] SLC minutes)



Mark:

Alright, if the Greek script is forced into case-less matching, then
why does it have to be mapped from Upper to Lower Case? I can see
that mapping the upper case Alpha to the lower case alpha makes sense
in terms of not confusing the Latin "A" vs Greek "Alpha" issue. But,
forcing Omega to be mapped to omega only compounds the Latin "w" vs
Greek "w" issue. Are there any considerations for these types of
mapping issues, or is it summarily determined UC ->UC in all matters
regardless?

tedd


>Other scripts do have upper/lowercase correspondences, just like the Latin
>script does. Users of those scripts are just as likely to want caseless
>matching as users of the Latin script (such as you).
>
>For more information, see http://www.unicode.org/unicode/reports/tr21/.
>
>Mark
>-----
>
>P?ll' ?¼?stato ?rga, kak?V d' ?¼?stato ¼?nta - ?m?rou Marg?t?
>[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]
>
>http://www.macchiato.com
>
>----- Original Message -----
>From: "tedd" <tedd@sperling.com>
>To: <idn@ops.ietf.org>
>Sent: Thursday, January 03, 2002 09:35
>Subject: Re: Character equivalence mapping (was: Re: [idn] SLC minutes)
>
>
>Mark, john, Edmon:
>
>>1. This issue was debated at length some time ago. I suggest that the
>people
>>arguing for visual confusability as a criterion for matching look at that
>>discussion in detail before proceding.
>
>I'm not arguing (in this debate) the "look-a-like" position. In other
>words, it makes no difference to me if certain glyph's look identical
>in numerous char sets. I am arguing the opposite position -- the
>characters in my example don't look a like.
>
>I am arguing the point that the decision "has been made" to map upper
>case Greek letters to lower case letters. For proof, look at the
>current version of nameprep ( http://www.imc.org/nameprep/  ) and try
>running code point 2126 (upper case omega) through it. You will find
>that it IS mapped to code point 03A9 (lower case omega).
>
>My question is "Why?" What's the foundation for this determination?
>For what good reason is there to conclude that the upper case Omega
>should be mapped to a lower case omega?
>
>I see no "A.com" to  "a.com" argument/problem here. Clearly, if
>someone registered ?.com and someone else registered w.com there is
>significant difference in identification between the two names. Those
>two domain names can be completely unique domain names with no
>significant resultant problems. Whereas, in the Latin char set, I can
>see the reason for making W.com and w.com identical (i.e., mapping W
>to w) because there is an UC/LC consideration/distinction in the
>language. But, that's not a problem in the Greek char set -- is it...
>really?
>
>>(i) From observation, when scripts have two cases, the
>>upper-case form is more likely to be highly stylized, and hence
>>differentiated from characters in other scripts, than the
>>lower-case one.  Hence, if one is going to adopt
>>stylization-based (glyph-distinction, if you prefer)
>>canonicalization rules, one is better off treating upper case as
>>the normal form, rather than lower case.
>
>It looks to me as if someone has already made the determination to
>map other languages based upon the Latin char set UC/LC problem
>without concern that other languages may not have the UC/LC
>distinction and thus be absent of the UC/LC problem. I think the
>Greek example I gave above sufficiently demonstrates my observation.
>
>tedd
>
>--
>http://sperling.com


--
http://sperling.com