[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] length restrictions on IDN label



Soobok Lee <lsb@postel.co.kr> wrote:

> UTF-8 forms make subset of the entire set of non-ASCII forms.
> Thus, the utf8-compliant subset has been under the overall length
> restriction imposed by RFC1035 on the entire set.

UTF-8 data stored directly in 8-bit DNS labels would be subject to
the 63-octet limit.  This is irrelevant to IDNs, because IDNs do not
store UTF-8 data directly in 8-bit DNS labels.  IDNA requires that
internationalized labels use their 7-bit ASCII form in DNS.

If someday you want to use UTF-8 forms of internationalized labels
directly in newDNS, you will need to make sure that newDNS allows more
than 63 octets per label.  Or you could use the UTF-8 form when it fits,
and fall back to the ASCII form when UTF-8 doesn't fit.

(Or you could decide it's easier to stick with ASCII in the DNS
protocol, and create the illusion of UTF-8 using a new resolver on the
client.)

Your argument seems to be:

1. An internationalized label in UTF-8 form is a sequence of octets.

2. RFC 1035 limits labels to 63 octets.

3. Therefore internationalized labels must have no more than 63 octets in
   UTF-8 form.

But you could try the same argument for UTF-16, and EUC-KR, and
iso-2022-jp, and BIG5, etc.  Do we conclude that any string that uses
more than 63 octets in any encoding cannot be an internationalized
label?

That would be absurd.

Perhaps the key to understanding this is to recognize that 8-bit DNS
labels are not internationalized labels.  IDNA makes no use of them.
Neither IDNA nor DNS defines any textual interpretation for them.  They
are just opaque binary data (except for the values <= 127, which are
ASCII characters).  We have no way of deciding whether 8-bit labels are
UTF-8 or ISO-8859-1 or EUC-JP, etc.  Until the DNS standard is updated
to assign some semantics, they are none of the above.

IDNA created some brand new kinds of labels that had never existed
before: non-ASCII textual labels.  They have never appeared in DNS,
cannot appear in DNS, and will not be able to appear in DNS unless DNS
is updated to support them (because the only text supported by today's
DNS is ASCII).  These new non-ASCII textual labels are outside the
universe of labels defined by RFC 1035, and therefore the RFC 1035
length restriction does not apply to them (not directly, although it
applies to their corresponding ASCII forms).

AMC