[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] I-D ACTION:draft-ietf-idn-idna-08.txt




on 6/9/2002 3:51 PM Adam M. Costello said the following:

> To say that a text string is
> "case-sensitive" is to say that changing characters from upper to lower
> case or vice-versa does change the identity of the string; in other
> words, case differences are not ignored when the string is compared.

That's right.

And since the encoded representation of the original string has to be
case-neutral for it to work in the legacy label, the input label has to be
made case-specific. The input label cannot be changed. This is regardless
of whether or not it is converted to lowercase before it is encoded; the
label which actually gets encoded is permanently made case-specific, at
least in its unencoded form.

> For IDNA, we wanted the mapping between non-ASCII labels and ACE labels
> to have the following property:  A case-insensitive comparison of
> two ACE labels always returns the same answer as a case-insensitive
> comparison of the two corresponding non-ASCII labels.  In order
> to achieve that property, a case-folding step is essential in the
> definition of the ACE mapping.  (Maybe you don't want that property, but
> it was a fundamental design goal of IDNA, in order to avoid surprising
> users with different comparison rules for ASCII versus non-ASCII names.)

We agree on history and partially agree on objectives. Let me explain
where we differ.

Specifically, I agree with the requirement to make *some domain names*
lowercase in order to facilitate simple comparisons. In particular, any
domain name which commonly represents a connection identifier (a hostname)
should conform to this requirement. Examples of this would include the
owner name of an A RR, the RRdata of MX and NS RRs, and most of the other
domain names which are commonly used for connection identifiers.

However, there are other domain names where this is not required. For
example, consider that Apple might want to encode the NBP name of AFP
servers within a zone as some kind of iNBP RR which is linked to an atalk
zone name. There is absolutely no reason that the iNBP RR owner name or
any of the RRdata elements must be mangled. Apple could choose to do so,
but there is no reason that we should require it. Similar arguments can be
made for NetBIOS names, NetWare SAP entries, NIS domains, and so forth.
There are plenty of other examples where this kind of facility needs to be
supported, but those are obvious candidates most of us are familiar with.

What we need in order to support those kinds of applications is to
separate nameprep from IDNA. Specifically, IDNA needs to apply encoding
against any label which suits the requirements (UCS character code inputs
which result in valid STD13 output), regardless of the stringprep profile
in use. Then the applications which create and parse the domain names are
the only ones that need to understand the stringprep profile in use for
that specific domain name.

The ridiculous part here is that under the existing STD13 rules, these RRs
can be used simply by defining an interpretation to the octets. For
example, Microsoft already provides a direct encoding of NetBIOS names
into UTF-8 and simply applies their own interpration to the RRs. Under
your rules, they couldn't use i18n domain names as effectively as STD13
domain names since they would have to sacrifice capitalization in the
process. Get it? i18n domain names need to have *at least* the same
flexibility as STD13 if they are to be adopted by the community. If not,
then interoperability will be harmed, because people will continue using
STD13 labels and doing their own thing.

I think you are missing a key concept here, which is that all of the RRs
are going to need to i18n definitions, and IDNA alone won't do it. When
the RR rules are defined, they will get stringprep profiles assigned to
them. At that point, the applications which create and interpret the
unencoded labels are the only ones that need to know anything, and the
infrastructure can store, transfer, compare and convert the opaque octets.
DM-IDNS-00 also tried to define a global namespace. It doesn't work. The
only way out is to use per-RR rules.

>>If somebody needs an RR that preserves case, there's no reason they
>>shouldn't be able to do so.
> 
> Agreed.  And I suggest two ways they might do this:  (1) Define a
> case-sensitive data format and don't call it a domain name.  (2) Define
> a mapping from non-ASCII domain labels to ASCII domain labels that
> doesn't involve case-folding.  But don't call this mapping IDNA, because
> it's not.  It might look very similar to IDNA, and might reuse pieces of
> it, but it shouldn't use the IDNA ACE prefix for a mapping that is not
> the IDNA ACE mapping.

(1) Anything which fits in the i18n namespace is a domain name, regardless
of whether or not IDNA can handle it. The problem is that IDNA is trying
to mandate the namespace according to the requirements of nameprep, when
there is no technological argument for doing so. (2) Defining alternative
type-specific codecs hinders deployment, and is unnecessary. Thus the rule
is not only arbitrary, but it also hinders interoperability.

> heard reports of some new DNS servers that try to guess the charset,
> in which case they might then do case-insensitive comparisons even for
> the non-ASCII characters.  And there's still the wide world of entities
> other than DNS servers, which also compare domain names, and their
> handling of 8-bit names is even less predictable.

Non-argument. The EDNS label is there to prevent this confusion. What the
owner of a zone does with the STD13 octets is up to them.

>>>Now consider an entity that knows that föo and FÖO and xx--fo-fka
>>>and xx--FO-ohA are domain labels, but does not know that they are
>>>special labels that don't use Nameprep.
>>
>>Why would it ask for a special RR that it doesn't know how to read?
> 
> Because it might be a caching DNS server.

Caches and replication servers have no need to understand the
capitalization or normalization rules in use with a particular domain
name. Only the nodes that create and interpret the domain names need to
know anything about the contents. Caches and replication servers only need
to understand the layout of the message.

> If you only care about the end applications, which know the special
> semantics of the special labels, then just use a different prefix to
> go with your different Stringprep profile.  Then you can be sure that
> entities that know IDNA but don't know about your special labels won't
> accidentally muck with them.

No, that won't work. That requires the resolvers, caches and replication
servers to understand special rules about the domain names before the
application can be deployed. Essentially, this requires the infrastructure
to be upgraded for every new domain name which gets defined.

There is absolutely no reason for this. It's a ridiculous artifact of an
arbitrary design. Decouple IDNA from nameprep and the problem is solved.

> I'm not sure what you mean by "encoding form".  The ACE form (which
> involves both Nameprep and Punycode) is not guaranteed to be reversible
> to the original capitalization.

Okay, that's a problem. May have to use something else entirely.

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/