[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] One profile for domain names, or many?



"Eric A. Hall" <ehall@ehsco.com> wrote:

> > The tricky part is that some of these subtypes are already in
> > wide use in a wide variety of protocols without having ever been
> > formalized.
> >
> > my intent is that the host field of a URI, the exchangers listed in
> > an MX record, and the domain field of an HTTP cookie are all of type
> > "host name", but no such connection has never been formally drawn
> > between these various protocol elements.
>
> I think I can speak for John when I say that this is what he and I
> both want to see fixed.

It would be great to have that fixed, but when will it happen?  Can
IDNA afford to wait for it?  If not, should IDNA go ahead and limit the
applicability of Nameprep to the still-vague concepts of "host name
label" and "mail domain name label" in the hopes that those terms will
become rigorous?  I pose these questions to the working group--would
this be comforting or disconcerting?  In the past, any time Paul and
Patrik and I have tried to talk about "what is a host name" it's been a
mess.

> Again, I would make the applications do the stringprep management,
> since they have to do so anyway for basic security measures.  All IDNA
> needs to be is a simple codec that converts inputs and outputs (unless
> I completely misunderstand its inner workings).

We bundle Stringprep and Punycode and the prefix into a single operation
for convenience of description, so we can say something simple like
"for any text label X, ToASCII(X) is an equivalent ASCII label".  We
needed ToASCII and ToUnicode to serve as definitions of the correct
results for *any* input, not just already-prepared input.  If you have
already-prepared input, you can optimize the Stringprep step down to
nothing.

IDNA never says that ToASCII needs to be performed all at once by a
single entity.  For example, an IDN-aware interface can require its
non-ASCII input to be already prepared, in which case when it performs
ToASCII the Stringprep step has no effect and can be optimized away.
The thing calling the interface then has the burden of performing
Stringprep, but not the rest of ToASCII.  In this scenario the work of
ToASCII has effectively been split across two entities.

> > A hypothetical newDNS protocol that allowed text labels to be
> > represented using UTF-8 while still supporting the RFC-1035
> > sequence-of-bytes labels would need an extra bit per label
> > indicating whether byte values 80..FF are UTF-8 text, or opaque
> > bytes like in DNS.  A hypothetical new resolver interface would
> > likewise need this extra bit per label if it wanted to support both
> > text-labels and byte-labels.
>
> The resolver doesn't have to keep track of this

Sure it does.  If I pass an 8-bit label to a new-resolver, the resolver
needs to know whether the 80..FF values are UTF-8 text or opaque bytes,
so that it knows how to set that bit in the newDNS query, or so that it
knows whether to convert to ACE before sending a DNS query.

AMC