[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Requirements I-D



--On Tuesday, May 16, 2000 1:44 PM -0600 mark.davis@us.ibm.com
wrote:

> There are some significant problems with the locale approach.
> 
> 1. It's more complicated
> 2. It requires that locale information always be sent (and
> that info is not always available. Remember; a lot of URLs are
>...
> (where * represents a dotless i, and & a dotted capital I); I
> can only register one of them. Since there is only one name
> registered, if a user types any of:
> 
> turkish_bath.com
> turk*sh_bath.com
> Turkish_Bath.com
> Turk&sh_Bath.com
> TURKISH_BATH.COM
> TURK&SH_BATH.COM
> 
> or any other case variation of these, s/he will get to the
> same place. However, when I put "turk*sh_bath.com" on a
> billboard or in a banner ad, or wherever, I can, of course,
> spell it correctly. When users type it in, they go to the
> right place.

Hmm.  Please review how the DNS protocols work, unless you are
planning to change them.  For this case, it would be correct to
say that, whatever the user types, she gets either the correct
answer or a "nodomain" condition.  That is a considerable
improvement over some other schemes, which might yield false
positives and other nasty problems.

However, unless the DNS _code_ knows about these conventions and
mappings for each country and subset of characters, the only way
to get the behavior you are looking for involves the user agent
trying each possibility in turn.   The combinations for that for
that get nasty in a hurry, at least unless you impose some
really draconian rules (e.g., no non-ASCII names except in the
terminal (left-most) name component).  Consider, for example,
the potential name (using your notation):

   www.f&ish.turk*sh_bath.com

The resolver has to try at least nine combinations, with all
sorts of interesting issues involving finding the name servers
at the second level (zero, one, two, or three of the names might
exist, and any of them might have a bunch of servers, some of
which might not be accessible).  And then that would need to be
repeated at the third level.

But that isn't the worst case.  The rule you impose may be
plausibly enforced at the second level, but becomes nearly
impossible at the third (these level numbers go up or down
depending on the TLD, but let's stick with this example).
Suppose one has www.f&ish.turk*sh_bath.foobar.com.  Think you
can guarantee that foobar.com and its NS delegations follow all
the conventions?  And keep in mind that each time you have to
probe the DNS and get a "nodomain" answer, it can costs you two
seconds per authoritative server (if they are hard to reach)
plus some time in you own cache.

Now, one could make this better by adopting a different
convention: instead of requiring that only one of these names be
registered and the resolution process would sort them out
--which is what I think you have proposed-- one could try to
insist that _all_ of the variations be registered and have the
same definitions and targets.  That would cause the lookups to
succeed the first time if they were going to, would not require
special DNS code or protocol changes, and so on.  Given a tool
that could take a set of these names and build a zone file (or
insert for one) with all of the permutations into it, it
wouldn't even be a huge effort.   It would, of course, make zone
files pretty big: if the typical label contained even four
non-ascii characters with peculiar wrapping or mapping
characteristics, and each one could map into three variations,
the number of RRs goes up by an order of magnitude.

> While this approach is not perfect, it is also not terrible:

It is, IMO, at least a little terrible.

    john