[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] I-D ACTION:draft-ietf-idn-idna-08.txt



"Eric A. Hall" <ehall@ehsco.com> wrote:

> Being undefined, they [80..FF] are just opaque codes.  Nothing more,
> nothing less.  They may be represented as characters within the scope
> of a local system who only has a charset for output purposes, but that
> doesn't mean the data contains characters.  The data contains opaque
> codes.  Nothing more.

So you're saying that a label containing some octets in the range 0..7F
and some in the range 80..FF is neither text nor binary, but actually
a mixture of the two.  That's a clever motivation for the comparison
algorithm that you endorse, but I don't see that model described by RFC
1035.  If anything, RFC 1035 appears to take the view that labels are
text, at least until further notice.

    <domain-name>s make up a large share of the data in the master file.
    The labels in the domain name are expressed as character strings and
    separated by dots.

    \DDD where each D is a digit is the octet corresponding to the
    decimal number described by DDD.  The resulting octet is assumed to
    be text and is not checked for special meaning.

    For all parts of the DNS that are part of the official protocol, all
    comparisons between character strings (e.g., labels, domain names,
    etc.) are done in a case-insensitive manner.  At present, this
    rule is in force throughout the domain system without exception.
    However, future additions beyond current usage may need to use the
    full binary octet capabilities in names...

Taken together, these statements strongly suggest that labels are
text until further notice, and therefore label comparisons must be
case-insensitive for the entire labels.

I doubt we're going to agree on this, but the fact that we could keep
arguing about it indicates that RFC 1035 is not clear about how to
compare 80..FF.

> > > If you want to enforce an interpretation of the eight-bit range,
> > > you have to use new RRs, a new class, an EDNS identifier, or
> > > something, in order to distinguish between the legacy and modern
> > > systems.
> >
> > Agreed.
>
> Then change IDNA.  Mandatory transliteration introduces the same kind
> of absurd recklessness.

You say it's dangerous to put non-ASCII data into a protocol message
where the semantics of non-ASCII data are undefined, and I agree.
That's why IDNA prohibits putting non-ASCII domain names into protocol
messages (and function arguments, structured data files, etc) unless the
protocol/interface/format explicitly invites IDNs.

I assume it's the ASCII --> non-ASCII conversion that worries you, but
that happens only when displaying names to users (the ASCII form would
be user-hostile), and only when the aforementioned prohibition does not
apply, so the risk is usually small, and implementors are invited to
forgo this conversion when they judge the risk to be significant.

AMC