[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] I-D ACTION:draft-ietf-idn-idna-08.txt



This message responds to things in the order they appeared in the
previous message.  It might be easier to follow if I reordered some of
it, but composing it has left me too tired to do so.  You might want to
make two passes.  :)

"Eric A. Hall" <ehall@ehsco.com> wrote:

> And since the encoded representation of the original string has to be
> case-neutral for it to work in the legacy label, the input label has
> to be made case-specific.

I asked for a definition of "case-specific", but you didn't give me one.
Now you've introduced a second term I'm unfamiliar with, "case-neutral".
I honestly don't know what you mean by those terms.

> I agree with the requirement to make *some domain names* lowercase in
> order to facilitate simple comparisons.
>
> However, there are other domain names where this is not required.

The same could be said for domain names that use only ASCII characters.
The case-insensitive comparison might not required for some of them, but
it's done for all of them, whether they need it or not.  IDNA extends
this philosophy to IDNs.  The same comparison rules apply to all IDNs.

With ASCII domain names, we had the luxury that we could preserve
case even while ignoring it; that is, ASCII domain names are
case-insensitive/case-preserving.  With IDNs in their ACE form, this
is not possible (at least not officially); instead we have to choose
case-insensitive/non-case-preserving or case-sensitive/case-preserving,
and we chose the former, judging the case insensitivity to be more
important that the case preservation.

If applications want to map case-sensitive identifiers directly onto
domain labels in owner names of resource records, they can, as long as
they avoid collisions.  This is already true for ASCII domain names,
and continues to be true for IDNs.  For example, in the ASCII world, we
can store information under the name FoO.net, and when someone looks
up FoO.net, they'll find the information.  We can't store different
information under fOo.net, even though fOo is a distinct identifier from
FoO.

Now reread that example but imagine an umlaut over the middle letter.
With IDNA, it's all still true.  The only difference is that if you do
a reverse query, you won't get the original capitalization, but reverse
queries are almost never used, so it's no big deal.

If case-preservation is important to an application (which is more
plausible if the applications is mapping case-sensitive identifiers onto
domain labels *inside* resource records), then it can define a mapping
that preserves case.  For example, it could use 8-bit labels (like
unprepped UTF-8 or foo-prepped UTF-8), or it could use something similar
to IDNA but with a different Stringprep profile and a different prefix.

> Microsoft already provides a direct encoding of NetBIOS names into
> UTF-8 and simply applies their own interpration to the RRs.  Under
> your rules, they couldn't use i18n domain names as effectively as
> STD13 domain names since they would have to sacrifice capitalization
> in the process.

Right, IDNA sacrifices case-preservation in order to get compatibility
with the ASCII infrastructure while continuing the tradition of
case-insensitive domain names.  Maybe NetBIOS doesn't like that
tradeoff, and would rather have case-preserving reverse queries.
Microsoft can continue to use 8-bit DNS names if they want.

Here's a stab at the relationship (or lack thereof) between 8-bit labels
(which are allowed by RFC 1035) and IDN labels (which are defined by the
IDNA spec).  I'm sure some people (even IDNA supporters) might not agree
with this interpretation, because it's a subtle question, but here goes:

8-bit labels in DNS (not EDNS) owner names, even if they are interpreted
as UTF-8 by the end clients, do not contain non-ASCII characters
from the point of view of the DNS infrastructure.  RFC 1035 does
not specify or allow any charset other than ASCII, therefore it's
impossible for an owner name to contain non-ASCII characters; the
octets 80..FF are allowed, but they don't represent any particular
characters (from the point of view of DNS).  Since IDNA explicitly
applies only to text labels (which I understand to mean labels composed
entirely of characters), DNS labels containing 80..FF are completely
outside the scope of IDNA; they are not IDN labels, but labels in an
independent namespace that is not subject to any of the rules for IDNs.
Applications can continue to use use such labels under the rules of
RFC 1035 without regard to IDNA.  I personally think this namespace is
under-specified and risky and should not be used, but it will still be
as present after IDNA as it was before.

That analysis applies to DNS, not EDNS.  EDNS could define entirely new
semantics and comparison rules for its 8-bit labels; there is no need
for EDNS labels to be compared in the same way as DNS labels.  EDNS
could even define multiple namespaces each with its own comparison
rules, and tag labels according to which namespace they belong to.

> I think you are missing a key concept here, which is that all of the
> RRs are going to need to i18n definitions, and IDNA alone won't do it.

I'm not missing that, I know that.  IDNA alone won't do it; IDNA is only
for internationalized textual domain labels.  RRs that map other data
types onto domain names might find that IDNA is not suited to their
needs.  They are welcome to define a more suitable mapping directly
from their non-domain-label data type to ASCII domain labels.  They are
welcome to borrow techniques from IDNA.

> When the RR rules are defined, they will get stringprep profiles
> assigned to them.  At that point, the applications which create and
> interpret the unencoded labels are the only ones that need to know
> anything, and the infrastructure can store, transfer, compare and
> convert the opaque octets.

I'm finally starting to understand your vision.  In order to correctly
convert labels between the non-ASCII form and the ACE form, or to
compare two labels, you would need to EITHER know and perform the
preparation algorithm for this particular label OR be guaranteed that
the the appropriate preparation has already been applied (in which case
you don't need to know what it is).

That's interesting, but I see a security concern:  What if the entity
handing you the label is maliciously not applying the appropriate
preparation?  It can trick you into making errors when comparing labels,
or trick you into converting a label into a non-equivalent label.  If
you don't know the appropriate preparation algorithm, you can't detect
this or protect yourself from it.

I suppose it might be possible to address that concern.  If you have an
interest in the correctness of a comparison/conversion, then you need
to know the preparation algorithm yourself.  On the other hand, if you
are performing comparisons/conversions without knowing the preparation
algorithm, you better not be relying on the correctness of the answers,
you better just be converting a name on behalf of the entity that
prepared it, or comparing names on behalf of an entity that prepared one
of them.

In your vision of the world (as I understand it), the ASCII
infrastructure would use ACE, while the new improved infrastructure
would use prepared-UTF-8, where the preparation is not the same for all
labels.  But I don't see what's so great about prepared-UTF-8 versus
ACE.  Both require the same amount of effort from all applications;
they have to do something special at the boundaries, the difference
is in the details.  If you want to sell me on a new infrastructure,
it's got to offer something more.  If it accepted unprepared-UTF-8,
then I might find it interesting, because it would require less effort
from applications that already use Unicode.  But in order for the
infrastructure to accept unprepared-UTF-8, the infrastructure would need
to know the preparation algorithm...

> (1) Anything which fits in the i18n namespace is a domain name,
> regardless of whether or not IDNA can handle it.

The i18n namespace doesn't exist until we define it.  As I argued above,
the 8-bit namespace that already exists under RFC 1035 is not an i18n
namespace, because it cannot represent non-ASCII characters, because no
charset other than ASCII is specified.  It is the job of this working
group to define the i18n namespace.  The definition of IDN in the IDNA
draft is a definition of the i18n namespace.

> > If you only care about the end applications, which know the special
> > semantics of the special labels, then just use a different prefix
> > to go with your different Stringprep profile.  Then you can be sure
> > that entities that know IDNA but don't know about your special
> > labels won't accidentally muck with them.
>
> No, that won't work.

I don't see why not.  At some point, someone needs to perform Stringprep
with some profile.  Why can't that entity go ahead and do the whole
transformation at the same time (Unicode to ASCII, prepend the prefix)?
What's the advantage of doing only the Stringprep part?

AMC