[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character equivalence mapping (was: Re: [idn] SLC minutes)



To whomever:

Some time ago, I argued (see below) that the current version (at that time) of nameprep mapped upper case Greek Letters to lower case.

My comments were basically dismissed as "That's the way we do it, get used to it!"

Now, I find that the current version of PUNNYCODE does exactly the opposite than what was claimed. For example, try entering the code point 2126 (upper case Omega) through --

http://www.imc.org/idna/do-idna.cgi

-- and see what happens. The end result is uppercase and NOT lowercase.

Is this the "new way" to resolve the old issue discussed? Has the "powers that be" reversed themselves or did I find an error in PUNNYCODE?

Many thanks for any replies.

tedd

--- as previously stated on this list in January 2002 ---

Mark, john, Edmon:

1. This issue was debated at length some time ago. I suggest that the people
arguing for visual confusability as a criterion for matching look at that
discussion in detail before proceding.

I'm not arguing (in this debate) the "look-a-like" position. In other words, it makes no difference to me if certain glyph's look identical in numerous char sets. I am arguing the opposite position -- the characters in my example don't look a like.


I am arguing the point that the decision "has been made" to map upper case Greek letters to lower case letters. For proof, look at the current version of nameprep ( http://www.imc.org/nameprep/ ) and try running code point 2126 (upper case omega) through it. You will find that it IS mapped to code point 03A9 (lower case omega).

My question is "Why?" What's the foundation for this determination? For what good reason is there to conclude that the upper case Omega should be mapped to a lower case omega?

I see no "A.com" to "a.com" argument/problem here. Clearly, if someone registered ?.com and someone else registered w.com there is significant difference in identification between the two names. Those two domain names can be completely unique domain names with no significant resultant problems. Whereas, in the Latin char set, I can see the reason for making W.com and w.com identical (i.e., mapping W to w) because there is an UC/LC consideration/distinction in the language. But, that's not a problem in the Greek char set -- is it... really?

(i) From observation, when scripts have two cases, the
upper-case form is more likely to be highly stylized, and hence
differentiated from characters in other scripts, than the
lower-case one.  Hence, if one is going to adopt
stylization-based (glyph-distinction, if you prefer)
canonicalization rules, one is better off treating upper case as
the normal form, rather than lower case.

It looks to me as if someone has already made the determination to map other languages based upon the Latin char set UC/LC problem without concern that other languages may not have the UC/LC distinction and thus be absent of the UC/LC problem. I think the Greek example I gave above sufficiently demonstrates my observation.


tedd

--
http://sperling.com