[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IDNA: is the specification proper, adequate, and complete? (was: Re: I-D ACTION:draft-ietf-idn-idna-08.txt)



> Because we have tried that approach, mostly inadvertently, many
> times before, and it has always gotten us into trouble.  "You
> are not permitted to do this" leaves us in a position where we
> can later make it permissible and give it an interpretation at
> the same time.  "It is undefined" seems to invariably create a
> community of people who are sure what "undefined" means --i.e.,
> who assign a meaning to it-- and who then become a serious,
> installed-base, impediment to any other interpretation.  And
> "this applies globally" will be taken as the basis for a claim
> that we are making a seriously incompatible change if we try to
> do something else in some other name space.

John,

I'm not saying that leaving things undefined is the way forward.
But you seem to be saying, in addition to wanting things to be well defined,
that there needs to be an attempt to either predict the future or
only apply IDNA in the smallest possible context.
I think there is a 3rd alternative which is to, unless explicitly overridden
elsewhere in the future, apply it everywhere. This can be perfectly
well-defined as far as I can tell and it means that any changes can be done
with the well-defined basis as a foundation.

> 	* RFC821 specified Internet email transport in terms of
> 	ASCII.  It didn't explicitly (enough) prohibit sending
> 	non-ASCII ("8bit") characters.  [...]

> 	* RFC 1034 and 1035 define labels in terms of ASCII
> 	characters, but contains more or less vague language
> 	about how labels containing octet values that cannot be
> 	ASCII are to be interpreted.  [...]

The above two examples seems to me to be cases of loosly defined
specifications. I agree that we want to avoid that in the IDN space to the
extent possible.

> 	As just one example of this, suppose we come along later
> 	and want to put Unicode more directly into the DNS.  We
> 	know that UTF-8 isn't a particularly efficient encoding
> 	-- its design is strongly influenced by the need to be
> 	compatible with octet-based systems.    We also know
> 	that UCS-4 (UTF-32) isn't very efficient either.  But we
> 	know a good deal about compression, and could do some
> 	elegant things about compressing a string written in
> 	Unicode characters.  But, to do that, we need _binary_
> 	labels (not just binary octets), i.e., a length-encoded
> 	bit string of up to 63 octets in length and with no
> 	variant interpretations based on whether the stored
> 	value of a given octet falls in the 0x00-0xFF range.
> 	That is pretty easy to do in the DNS structure, but it
> 	requires per-RR or per-Class definitions of how the
> 	label string is interpreted/ matched.  If we read
> 	1034/1035 as "it is all characters" then either this is
> 	hopeless or we will need to invent another kludge
> 	(fortunately, the obvious one is still more efficient
> 	than UTF-8).  If we read 1034/1035 as "only the existing
> 	RRs and Classes are specfied, new ones get to specify
> 	their own rules"  then the range of options remains open.

I think the place where this discussion belongs is the now expired
draft-ietf-dnsext-unknown-rrs-02.txt. But I note that it is lacking
as well in that it silently seems to assume that all the owner names
are ASCII text by saying:
   The owner name is
   still set to lower case.

I agree it is highly desirable to be able to define new RR types and/or classes
without the ASCII case insensitive comparison of the owner names. But the
feasibility of this is a function of what existing DNS implementations do with
unknown types and classes. If it turns out that this isn't possible one can
still envision efficient encoding of Unicode owner names e.g. by applying the
Bootstring algorithm (used by punycode) but have the output code set be e.g.
0-0x40,0x5b-0xff i.e. avoid using bytes which an ASCII case insensitive
comparison would trip on.


The alternative seems to be to embark on discussions whether RR type X
(for different values of X including e.g. SRV, NAPTR) benefits
enough from IDNA to make that RR type be covered by the IDNA specification.
I fail to see how this WG can have expertize in determining the usage of
all defined RR types. Even if the WG had that expertise I still wouldn't be
suprised there were radically opposing judgements about "benefit enough from 
IDNA" for some types.

  Erik