[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Determining equivalence in Unicode DNS names



I am new to this, so please forgive me if the suggestion I am about to
make has been discussed before.

Some introduction: I work on networking at Apple. Apple cares a lot about
international text, and consequently international domain names, and
consequently my management looks to me to give them input about what is
going on in this area. Hence my need to educate myself about the current
state of the world in IDN.

It seems to me that one of the great problems of IDN is one that is
fundamentally unsolvable: an attempt to determine, once and for all time,
a single global set of rules for deciding if two strings are "equal" or
"equivalent".

For US ASCII, equivalence is easy: For good or bad, we have decided that
upper case and lower case letters are equivalent, and that is the end of
the debate. Resolvers know that upper case and lower case are equivalent,
servers know it, and caches know it, so they all agree on whether any two
given names are equivalent. If a resolver asks for "APPLE.COM" and the
server gives an answer for "apple.com", then the client (and any cache)
understands that this is an acceptable answer to the question.

The problem as I see it, right now, is that if a client asks for the
address record for "www.pépsi.com." (with an accent), and it gets back a
DNS reply with an answer giving the address for "www.pepsi.com." (without
an accent), then the client will ignore the answer. Even if the client
does know that "pépsi" is equivalent to "pepsi", a caching resolver in
between the client and the server may not, and may ignore the answer.

There are two problems here. The first is that I don't think we ever be
able to agree on a single set of equivalence rules for the whole world.
The second is that even if we did, we'd have to have a global flag day
where every client, server and caching resolver was simultaneously
upgraded to know the new rules.

It seems to me that the solution is to give up on the idea of a single
global set of rules, and instead let each name server be authoritative
for the equivalence rules for the zones for which it is authoritative. If
a client tries to look up "www.pépsi.com.", and the "com" name servers
have been configured to treat "pépsi" as equivalent to "pepsi", then they
return the answer for "pepsi.com.", and in the reply they also include a
(programatically generated) DNAME record which *tells* the client and any
intervening caching resolvers that these two names are equivalent:

pépsi.com.              IN	     DNAME	  pepsi.com.
pepsi.com.              IN      NS      dnsauth1.sys.gtei.net.
dnsauth1.sys.gtei.net.  IN      A       4.2.49.2

This way we don't need to have a global flag day, because as servers are
updated to support international domain names, they can "educate" old
clients as they go, using DNAME. If we find that text equivalence rules
evolve over time to meet changing needs that's fine too, because the
servers can be upgraded one at a time as the user's needs demand that.
Clients and caches don't need to know any of the equivalence rules at
all; they just need to obey the DNAME mappings that they are told.

Stuart Cheshire <cheshire@apple.com>
 * Wizard Without Portfolio, Apple Computer
 * Chairman, IETF ZEROCONF
 * www.stuartcheshire.org