[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Determining equivalence in Unicode DNS names



In a message dated 2002-01-16 19:48:54 Pacific Standard Time, 
idn.amc+0@nicemice.net.RemoveThisWord ("Adam" for short) writes:

>> The problem as I see it, right now, is that if a client asks for the
>> address record for "www.pépsi.com." (with an accent), and it gets back
>> a DNS reply with an answer giving the address for "www.pepsi.com."
>> (without an accent), then the client will ignore the answer.
>
> Indeed, because pépsi and pepsi are two distinct labels.  This is like
> today, if a client asks for colour.com, then it will ignore a response
> telling the address of color.com.  The server needs to answer the
> question that was asked, not some other question that it considers
> "close enough".

For some people, this may not be all that obvious.  We have had discussions 
and read proposals in which it is stated that the relationship between 
"lookalike" characters, like U+0041 and U+0391 and U+0410, or between 
Simplified and Traditional Chinese characters with the same language-specific 
meaning, is no different from the relationship between Latin '"E" and "e".

Presumably there are those who would also see the relationship between "e" 
and "é" in the same way, and would therefore expect "pepsi.com" to be matched 
not only by "PEPSI.COM" but also by "pépsi.com".

There are good and valid reasons for not treating certain classes of 
characters the way Latin case pairs are treated, but ultimately there is 
bound to be a measure of arbitrariness in the system, and people will just 
have to accept the way things are defined.  After all, even traditional Latin 
case pairing gets complicated for languages such as German ("SS" -> "ß") and 
Turkish (where "I" and "i" are not a pair).

-Doug Ewell
 Fullerton, California