[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] One profile for domain names, or many?



"Eric A. Hall" <ehall@ehsco.com> wrote:

> Profiles present a great opportunity to enforce strong and consistent
> data-typing with the different domain names.

I can see that, but part of my concern is that you have tended to try
to define the subtypes of domain names in terms of DNS resource record
types.  Domain names get used in protocols other than DNS, where they
are not tagged with RR types.  So the subtypes of domain names would
have to be defined more abstractly, for example:

    host names
    mail domain names
    mailbox names
    service names
    kerberos realm names

The RR types could then refer to these abstract subtypes, but other
protocols could also refer to the abstract subtypes.

The tricky part is that some of these subtypes are already in wide use
in a wide variety of protocols without having ever been formalized.  In
my example list of subtypes, my intent is that the host field of a URI,
the exchangers listed in an MX record, and the domain field of an HTTP
cookie are all of type "host name", but no such connection has never
been formally drawn between these various protocol elements.  Similarly,
my intent is that domain names in email addresses in mail headers, and
domain names in email addresses in SMTP commands, are both of type "mail
domain name", even though RFC 821 places tighter restrictions on the
syntax than RFC 822 does.

I suppose the way to attempt to accomodate your model in IDNA (but I'm
not sure it could be done with sufficient rigor) would be something
like this:  ToASCII and ToUnicode would take a Stringprep profile as
a parameter.  This would be yet another thing that the application
would have to select, along with the AllowUnassigned flag and the
UseSTD3ASCIIRules flag.  The IDNA spec would require that Nameprep and
only Nameprep be used with host name labels and mail domain labels.
Furthermore, applications would be forbidden from using IDNA with any
other label types until profiles for them had been standardized.

> There is another reason for going this route, which is that the
> presence of eight-bit codes in the STD13 namespace makes managing
> UCS names in dual-mode servers extremely difficult.  It essentially
> requires that servers flag eight-bit domain names as NOT UCS so that
> the names don't get looked at when an EDNS/deACE'd query comes in.

I came to that realization this morning.  A hypothetical newDNS protocol
that allowed text labels to be represented using UTF-8 while still
supporting the RFC-1035 sequence-of-bytes labels would need an extra
bit per label indicating whether byte values 80..FF are UTF-8 text, or
opaque bytes like in DNS.  A hypothetical new resolver interface would
likewise need this extra bit per label if it wanted to support both
text-labels and byte-labels.

I don't know if I'd call that "extremely difficult".  The only way to
avoid the extra bit while adding native Unicode support is to drop
support for RFC-1035 sequence-of-bytes labels.

In fact, if I were designing a new resolver interface, I'd want it
to have Nameprep built in (and maybe a few other common profiles, if
there are such things), in which case I'd still need extra per-label
meta-info, so the application could say "Please apply Nameprep for
me" or "This label is already prepared, leave it alone" or maybe even
"Please apply Fooprep for me".  The application might also want to be
able to say "Please return only prepared labels to me" or "Please feel
free to return unprepared labels to me, and I'll be responsible for
preparing them if needed".  The latter option allows for the possibility
of preserving case or non-normalized forms (which might or might not be
possible depending on the underlying protocol).

AMC