[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IDNA: is the specification proper, adequate,and complete? (was: Re: I-D ACTION:draft-ietf-idn-idna-08.txt)



--On Thursday, 13 June, 2002 12:02 +0200 Erik Nordmark
<Erik.Nordmark@sun.com> wrote:

>> I'd like to give those future designers the
>> opportunity to make choices about whether to use IDNA (or ACE
>> generally) to handle internationalization, the ability to
>> think through whether more of the normalization or matching
>> processes should be moved to the server, rather than being
>> handled exclusively in the client, etc.  I don't want (or
>> need) to predict what they will conclude, but I think it is
>> unnecessary and dangerous for us, at this stage, to write
>> what seems to be an "all internationalization uses IDNA, now
>> and forever, including in RRs and Classes and uses of the DNS
>> we have no way to anticipate" rule.
> 
> John,
> 
> I'm trying to understand why, if e.g. the community goes down
> the path of doing a new class, why such a specification can't
> update the IDNA RFC to e.g. say that IDNA now only applies to
> class=IN (or whatever is approiate). 

Because we have tried that approach, mostly inadvertently, many
times before, and it has always gotten us into trouble.  "You
are not permitted to do this" leaves us in a position where we
can later make it permissible and give it an interpretation at
the same time.  "It is undefined" seems to invariably create a
community of people who are sure what "undefined" means --i.e.,
who assign a meaning to it-- and who then become a serious,
installed-base, impediment to any other interpretation.  And
"this applies globally" will be taken as the basis for a claim
that we are making a seriously incompatible change if we try to
do something else in some other name space.

Some examples, in the event that this still isn't clear:

	* RFC821 specified Internet email transport in terms of
	ASCII.  It didn't explicitly (enough) prohibit sending
	non-ASCII ("8bit") characters.  So, when the desire to
	send, e.g., ISO 8859-1 chacters, arose, a community
	sprung up that, since the specification left 8bit
	characters undefined, they were free to define them as
	8859-1.  Another community sprung up that said that,
	since they were undefined, they were prohibited, and
	zeroed the high-order bits (or rejected the messages) in
	MTAs.  The combination led to the "just send 8" movement
	and huge problems in getting MIME, content character set
	tagging, and the SMTP extension model agreed to and
	deployed.  The aftereffects of those problems persist to
	this day.
	
	* RFC 1034 and 1035 define labels in terms of ASCII
	characters, but contains more or less vague language
	about how labels containing octet values that cannot be
	ASCII are to be interpreted.  That has touched off a
	debate within this WG (and I wouldn't expect the debate
	to be much different within namedroppers, but would
	encourage you to try that experiment) as to whether this
	means "octets above 0x7F are undefined and their use is
	prohibited until they are defined" or "octets above 0x7F
	are to be compared exactly, while _all_ octets of 0x7F
	and below are to be compared case-insensitively
	(assuming ASCII interpretation of those bytes).   We
	have either got to be sure that we can predict the
	future, or we need to explicitly bar the cases we don't
	(and don't need to) understand.  The other cases just
	lead to confusion and non-interoperability... and our
	track record for future-predicting is not good.
	
	* There is actually a substantive (in addition to
	philosophical) reason why I have been pushing back on
	the "you can use a binary label anywhere" interpretation
	of 1035 (as represented in 2181 and elsewhere).  While
	I'm unhappy about treating characters from
	case-differentiating scripts differently depending on
	whether the octets of their codings happen to fall in
	the 0x00-0x7F or 0x8F-0xFF ranges for existing
	character-based RR types, it isn't my main concern.   My
	concern is that this interpretation requires that all
	labels for all RRs, in all Classes, present and future,
	be treated as character strings using those rules.
	
	As just one example of this, suppose we come along later
	and want to put Unicode more directly into the DNS.  We
	know that UTF-8 isn't a particularly efficient encoding
	-- its design is strongly influenced by the need to be
	compatible with octet-based systems.    We also know
	that UCS-4 (UTF-32) isn't very efficient either.  But we
	know a good deal about compression, and could do some
	elegant things about compressing a string written in
	Unicode characters.  But, to do that, we need _binary_
	labels (not just binary octets), i.e., a length-encoded
	bit string of up to 63 octets in length and with no
	variant interpretations based on whether the stored
	value of a given octet falls in the 0x00-0xFF range.
	That is pretty easy to do in the DNS structure, but it
	requires per-RR or per-Class definitions of how the
	label string is interpreted/ matched.  If we read
	1034/1035 as "it is all characters" then either this is
	hopeless or we will need to invent another kludge
	(fortunately, the obvious one is still more efficient
	than UTF-8).  If we read 1034/1035 as "only the existing
	RRs and Classes are specfied, new ones get to specify
	their own rules"  then the range of options remains open.

There is even a question about today's installed base.  Has
anyone who is competent to do so taken a careful look about what
the implications would be if IDNA were applied to Hesiod-Class
data?  I'm not very worried about chaosnet, but we usually try
to avoid blowing away know installed applications without
careful consideration.   Now, one could argue that case would be
covered by the language that says use of IDNA is an
application-choice, but that would call for a piece of a
security considerations section (or elsewhere) that explains
issues an application should consider before deciding to use
IDNA.  And that section isn't present.  It would also, I think,
be hard to write and get right -- why look for trouble by making
sweeping statements about where IDNA applies when they don't
appear necessary.

> That specification will
> need to also specify how QCLASS=ANY is handled, etc, etc.

Maybe.  We have successfully dodged this issue with the three
classes that are in use so far.  And I would argue that we are
more likely to be forced to define it if we make IDNA apply
across everything than if we say "IDNA applies here, to cases we
can list and enumerate, and applying it elsewhere requires
specific action".  

> If we declare that IDNA needs to restrict itself (e.g. to
> class=IN) then it seems like we need to solve the QCLASS=ANY
> issue now. This seems suboptimal both because the work might
> be wasted (if a new class protocol isn't pursued) but worse,
> the solution might be broken  since we have no way of
> predicting what the new class proposal might need with respect
> to QCLASS=ANY.

See above.  It is my belief that we need to address the
QClass=ANY question only when a class comes along that requires
it.  We have gotten away without it for a long time.  My own
hunch is that it doesn't have a solution without imposing the
"look everywhere in case someone has data" property of X.500 DSP
on the DNS, which would be a really bad idea and that, instead,
if someone really needs "look in this Class and then in that
one" capability, they are going to have to invent, e.g., a
"QTYPE=INorUC" arrangement and try to make it work.  I wish them
luck.   

Aside: at a procedural level, I think a persuasive argument can
be made that the majority of the IDN WG participants don't
understand the details of this discussion or why it is
important, even after it is called to their attention.    If
that is true, a document that makes "applies everywhere"
statements about DNS use is inappropriate to be treated as a
product of this WG.

> I think there is a separate issue whether the rules for QNAME
> and RDATA domain name slots should be the same, or e.g.
> whether the RDATA slots in SOA, CNAME(?), etc should use
> different nameprep (e.g. to allow different case, or to allow
> any codepoints). I think that issue can be decided without
> having to be able to predict the future.

Absolutely.  My only concern is that it should be decided
explicitly and not by either handwaving (which the current
document appears to me to do) or the hope that the definitions
will appear in some future document which we have no current
plan to get written (which I think at least some of the authors
might argue was the solution).   If we can't decide, and can't
get that document, then I believe that the uses need to be
prohibited until we have clear definitions.  Otherwise, people
will make up their own answers while pointing to the IDNA
standard, and we will have non-interoperability chaos.

>> I am suggesting that (i) it should still be possible to use
>> IDNA, but with a different profile and (ii) we should not, as
>> a side effect of the way we specify IDNA today, prevent that
>> RR from being defined, regardless of whether it can use
>> IDNA+different profile or whether it needs to use an entirely
>> different protocol to set up, compare, and map names.
> 
> I don't think there is anything that prevents this, but it
> might be a bit cumbersom. For instance, a new definition could
> say "apply ToAscii with step 2 replaced by foo" where "foo" is
> something different than nameprep. (And  similarely for
> ToUnicode).

In practical terms, if anyone actually decides to do this in an
implementation, the burden of demonstrating how to do it in a
non-cumbersome way lies on them.  I think Eric and I have
described how to _define_ it fairly easily.   I note that, from
a definitional standpoint, the whole notion of Classes and RRs
in the DNS ought to be a matter of tables but that the
predominant implementations have ended up requiring code changes
each time one is added and with a lot of that code being
special-purpose.    The implementer's life is hard sometimes.
But, while I firmly believe in making it easy when possible,
hardness (or cumbersomeness) shouldn't be a bar to our doing the
right thing.

And, again, where Eric and I _may_ differ is that, while I'd
rather see this opened up if we can figure out how to do it (in
the interest of modularity and increased likelihood of future
interoperability), my bottom line is simply that we define IDNA
as applying in areas that we have studied and understand that it
applies reasonably, that we ban it elsewhere and leave those
other cases for the future.

> As part of my review I explored if there could be a better
> separation between nameprep and the other parts of IDNA, but
> that seems to result in different handling of case for
> all-ASCII labels.
> If nameprep was separable from ToASCII it would need to be the
> first step. This would mean that all-ASCII labels would be
> case-folded, which they are not as ToASCII is specified. Hence
> an application using IDNA would potentially display different
> case for non-IDNs to users.

Or nameprep would have to special-case those characters and push
them down a code branch it doesn't now have.  I admit that isn't
attractive, but...

        john