[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IURL vs URL, IDNS name vs DNS name



--On Thursday, 10 February, 2000 14:36 -0800 Larry Masinter 
<LM@att.com> wrote:

> In draft-masinter-url-i18n-04.txt, we took the tack of
> defining a _new_ protocol element, an "IURL"
> (Internationalized URL) which allowed 8-bit UTF8 sequences. We
> left "URL" alone, but noted that there might be some
> situations, protocols and contexts that could be upgraded to
>  use IURLs instead of URLs. This got us out of the quandry of
> wanting to upgrade technology but dealing with older software
> that couldn't deal with the new representation.
>
> A similar approach could work for "DNS names": define a new
> protocol element (IDNS name), note that existing compliant DNS
> servers _could_ handle IDNS names as well as DNS names, and
> then allow some way of encoding IDNS names in DNS names.
>...
> This is a migration strategy. If you're going to migrate from
> "everyone assumes DNS names are ASCII" to "DNS names allow
> UTF8", you have to allow for an interim state where there are
> some contexts in which DNS names are only ASCII and other
> contexts where they're allowed to have UTF8. You can't even
> talk about this if you say "DNS" for both contexts, so you
> have to make up a new name. So call the context of "DNS names
> that are allowed to have UTF8" the "IDNS" context.

Larry,

Keep in mind that URLs are pretty simple, in the sense that they 
more or less reference objects (specifically, things that are 
not recursively URLs).  And there isn't an object->URL mapping 
inherent in anything (sometimes, more the pity, but that is a 
separate conversation).  In general, the sort of system you 
suggest will work when those conditions are met.   But, with the 
DNS, you'd be talking about some fairly complex situations and 
combinations of situations.  For example:

* If you put these more or less into the existing DNS, would you 
contemplate an "IPTR" record whose RHS is a potentially 
non-ASCII name?

* Give, especially, that CNAMEs can point to records in another 
domain entirely, would you contemplate
      CNAME   (label and RHS in ASCII)
      ICNAME  (label in something else, RHS in ASCII)
      CNANMEI  (label in ASCII, RHS in something else)
      ICNAMEI   (both in  non-ASCII)

It is pretty easy to poke holes in this example, but you get the 
picture.

There are, however, at least two variations on this theme.  Some 
of us suspect that the WG may be driven to one of them once the 
interoperability problems with the existing deployed base and 
the impossibility of doing a "flag day" (or even "flag year" or 
"flag decade") changeover is understood.   Note that the two 
examples below are just examples -- neither would work without a 
lot of details, some quite subtle, being worked out and filled 
in.

(1) The place where that "I" symbol goes is not in the record 
type, but in the Class, possibly with some very fancy 
"additional information" or reinterpretation rules and 
recommendations.   The new type would shadow the old "IN" one, 
with all of the   same record types but different rules for 
forming and interpreting strings in labels and predicates.  An 
I18N-capable resolver might then do a lookup in Class "INN", 
rather than Class IN. Servers might be trained, if no Class INN 
records were found, to try to do a lookup in Class IN and return 
the right stuff. (Of course, that second lookup would fail, and 
wouldn't be worth trying, if the query's content was really a 
non-ASCII string, but there would be cases in which it would 
be.)  One of the worrisome cases is that a reverse ("PTR") 
lookup in Class IN would fail if the address was registered only 
in the Class INN space -- maybe that is ok, maybe we'd need a 
kludge.

(2) But that model isn't very different from a directory overlay 
on the DNS, where the directory is entirely internationalized 
and the contents of the DNS itself are eventually viewed as just 
a collection of octets in particular ranges (that happen to 
correspond to ASCII A-Z, a-z, 0-9, and "-"), i.e., as protocol 
elements not names that are expected to have any human 
significance.  Use of a directory overlay for this purpose would 
have some advantages over using the DNS, e.g., one could do 
smarter lookups if one assumed one was dealing with names in a 
known natural language than is sensible for the labels of the 
current DNS and, should the character set wars break out again, 
one might use different systems in different environments.  The 
reverse mapping problems wouldn't be easy, but might not be 
significanty more difficult than would exist if the new names 
were embedded in the DNS on a "no flag day and you can't wreck 
old servers, resolvers, or applications" basis.

     john