[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Determining equivalence in Unicode DNS names
In a message dated 2002-01-16 19:48:54 Pacific Standard Time,
email@example.com.RemoveThisWord ("Adam" for short) writes:
>> The problem as I see it, right now, is that if a client asks for the
>> address record for "www.pépsi.com." (with an accent), and it gets back
>> a DNS reply with an answer giving the address for "www.pepsi.com."
>> (without an accent), then the client will ignore the answer.
> Indeed, because pépsi and pepsi are two distinct labels. This is like
> today, if a client asks for colour.com, then it will ignore a response
> telling the address of color.com. The server needs to answer the
> question that was asked, not some other question that it considers
> "close enough".
For some people, this may not be all that obvious. We have had discussions
and read proposals in which it is stated that the relationship between
"lookalike" characters, like U+0041 and U+0391 and U+0410, or between
Simplified and Traditional Chinese characters with the same language-specific
meaning, is no different from the relationship between Latin '"E" and "e".
Presumably there are those who would also see the relationship between "e"
and "é" in the same way, and would therefore expect "pepsi.com" to be matched
not only by "PEPSI.COM" but also by "pépsi.com".
There are good and valid reasons for not treating certain classes of
characters the way Latin case pairs are treated, but ultimately there is
bound to be a measure of arbitrariness in the system, and people will just
have to accept the way things are defined. After all, even traditional Latin
case pairing gets complicated for languages such as German ("SS" -> "ß") and
Turkish (where "I" and "i" are not a pair).