[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] Re: An idn protocol for consideration in making therequirements



> At 12:34 00/02/10 +0100, Karlsson Kent - keka wrote:

> > ----------back to Ned's text-----------------------

> > > The issue isn't the DNS, it is applications that use the DNS.  And nobody
> > > claimed such applications will die either (although it is known that some
> > very
> > > old ones will). What is claimed is that applications will trash domain
> > names in
> > > UTF-8.

> Ned, what exactly happened? It would be interesting to know
> how much of that is:

> - Security leaks
> - System blowups
> - Other screw-ups of whatever kind
> - Simple 'domain not found' errors

In the case of most email applications, none of the above. Domains in
email are restricted by the standards to be 7bit only. It has always been
this way. As such, MTAs, message stores, and user agents often either:

(1) Apply the "liberal in what you accept, conservative in what you emit"
    principle and accept 8bit domains but delete, encode, or otherwise
    downgrade so aa not to emit anything that's incompliant.

(2) Simply reject anything that contains an 8bit domain name.

As I said before, such implementations are currently 100% standards-compliant,
whereas an implementation that blithely passes on 8bit material is not. So
calling this a "screwup" is totally bogus. What you want to do amounts to
standing the world on its head, claiming that what was compliant before now is
not and what wasn't compliant now is.

Again, I'm not saying you cannot change the rules and get implementations to
change. You can. We did exactly this with 8bit content in message bodies, for
example. But you have to protect existing imlementations in the process. This
can be done in a lots of ways, but it has to be done somehow. Try and avoid it
and you end up with something that either won't make it through the standards
process or which won't deploy once it does. And this absolutely does change
the comparitive merits of using 8bit versus 7bit.

Also note that there are differences between using 8bit and using an expanded
7bit repetoire that have to do with what various standards say is allowed in
domain names in various different protocols.

> Also, one potential failure from the old days is that the eigth bit
> is lost. I would like to do some checks on how many of the currently
> registered domain names could be interpreted as legal UTF-8 names
> that had their 8th bit taken off (other than the trivial identity
> case, which is of course UTF-8). If somebody can point me to some
> data, or tell me how to get at it, or otherwise collaborate on this,
> please tell me.

I don't think simple bit stripping is what you're going to see happen. See
above.

> Reencoding non-ASCII into ASCII is a trashing in itself.  The
> experience with QP and BASE64 for text is frightening enough
> not to accept anything in that vein again.

This is simply an opinion. I don't agree that it is anywhere near
this clear-cut.

> >  And for e-mail
> > those things were really temporary measures anyway, given
> > ESMPT and 8bit.  QP and BASE64 for 'plain' text are still used
> > (even emitted by the e-mail system I use; which I cannot
> > control in this respect).

Not really. The use of encodings for text in message bodies can be viewed as
temporary. The use of encodings in headers may or may not be -- there is no
proposal on the table for doing anything else. Until and unless there is, I
don't see anything temporary about it. (And if it is such a major problem, why
hasn't such a proposal materialized?)

> > I'd much rather have some temporary problems with UTF-8 than
> > have permanent problems with something that reencodes non-ASCII
> > into ASCII.

> I think there are some very strong points here. One thing
> that ASCII-based protocols are good at is ease of debugging.
> UTF-8 would give you exactly that ease, you just have to
> get the right terminal emulator. UTF-5 or CIDNUC don't give
> you that at all.

I'm sorry, but I completely disagree with this assessment, and I deal with this
stuff on a daily basis. UTF-5 or similar schemes are in fact easier to handle
from a debugging perspecitve. A person not familiar with the "foo" set of
characters for any set "foo" is going to be a lot more comfortable with the
ASCII encoding.

I don't view this as an argument in favor of 7bit, but I absolutely reject
the argument that ease of debugging favors 8bit.

> Of course, some people may say that they don't read anything else
> than ASCII anyway. Well, organizing your terminal emulator
> so that it uses your favorite escaping is not a big problem.
> But mangling things in the protocol so that those people
> who could actually just read the stuff can't read, or may
> only be able to read with much more sophisticated, specialized
> tools that we will wait forever, is a much different thing.

Unfortunately, at present terminal emulators capable of rendering UTF-8
fall into the category of sophisticated, specialized tools.

				Ned