[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IDNA interoperability failures, once again



Adam M. Costello writes:
> "D. J. Bernstein" <djb@cr.yp.to> wrote:
> > Yes or no:  Should ``dig'' convert its results from your 7-bit
> > encoding to the local character set? (Assume the LANG=en_US.UTF-8
> > locale, so that the conversion is at least theoretically doable.)
> Yes.

That will cause interoperability failures. It will break other programs,
even though those programs are fully compliant with today's standards.
Mail will bounce!

Costello persists in claiming that IDNA will allow IDNs to be deployed
right now _without_ any failures other than incorrect displays. That
claim is simply not true. Mail will bounce. Web links will fail.

If IDNA's special-purpose 7-bit encoding is supported by _no_ networking
programs, or (in a fully Unicode world) by _all_ networking programs,
then mail won't bounce. In the intermediate stage, when the encoding is
supported by _some_ networking programs, mail will bounce.

> Any user clever enough to use dig and to use programs that
> parse its output is probably also clever enough to prepend "env
> LANG=en_US.US-ASCII" to the command line if necessary,

Costello is weaseling.

He is saying that mail won't bounce _if_ the program invoking dig (in
this case, a shell script) is changed _before_ the dig program is
changed. But the IDNA specification does not impose any such ordering.
Costello wants the dig authors to change their code right now.

More importantly, there _does not exist_ an ordering compatible with all
of today's cross-program data transfers. Addresses are copied from mail
programs to browsers; addresses are also copied from browsers to mail
programs.

> the man page does not specify the output format of dig

Actually, the dig manual page clearly states that dig prints answers
from name servers. There is only one printed DNS record format in BIND,
namely BIND's zone-file format, which is documented in detail and meant
to be machine-parsed.

The format is very badly engineered---it requires far too much effort to
parse completely---but that's a side issue. The point is that programs
_do_ read it.

In fact, the current dig man page explicitly mentions at one point that,
to simplify machine parsing, dig avoids one feature of the format by
default. Costello is simply wrong when he claims that the dig output is
meant only for display and that programs shouldn't be looking at it.

> Some programs are intended to be
> redirected and have human-unfriendly machine-friendly output.  Some are
> intended to be viewed and have human-friendly machine-unfriendly output.
> Some are somewhere in between, which motivates the question here.

The line that Costello is attempting to draw is directly contrary to the
UNIX philosophy. UNIX programs are _designed_ to be ``in between.''

This is explained in detail in Gancarz's book. Here, for example, is an
excerpt from the ``Make Every Program A Filter'' section:

   When you assume that the receptacle of a program's data flow might be
   another program instead of a human being, you eliminate those biases
   we all have in trying to make an application user friendly. You stop
   thinking in terms of menu choices and start looking at the possible
   places your data may eventually wind up. Try not to focus inward on
   what your program can do. Look instead at where your program may go.
   You'll then begin to see the much larger picture of which your
   program is a part.

These ideas are most fully developed in UNIX, but they can also be seen
in other operating systems, in tools ranging from copy-and-paste to
object-linking frameworks. Some of these tools have a side channel for
character-set information, but most of them are designed for a world
with a unified character encoding---ASCII yesterday, UTF-8 tomorrow.

The bottom line is that the ``dig'' problem is shared by _thousands_ of
UNIX programs. They deliberately provide output in a format that can be
viewed by the user, or sent to another program, or both.

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago