[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] length restrictions on IDN label



Erik Nordmark <Erik.Nordmark@sun.com> wrote:

> > an internationalized label can represent at most 63 code points,
> > whether it's ACE or not.  A given encoding uses a bounded number of
> > octets per code point, so you can allocate your buffers based on
> > that.
>
> 63 code points is presumably a conservative number.  Given the 4 octet
> ACE prefix you can only fit a 59 octets worth of punycode output
> per label, hence presumably 59 code points is a tighter limit for
> non-ASCII internationalized labels while 63 code points is the limit
> for ASCII labels.

True, but which limit you care about depends on the encoding.  For
example, if you're using UTF-32, then a regular ASCII label can have 63
code points each occupying 4 octets.

Soobok Lee <lsb@postel.co.kr> wrote:

> IDNA section 6.1 goes further than that by allowing _protocols_ to use
> non-ACE labels which are not presentation forms nor textual labels,
> but protocol elements.  What if future ESMTP allows utf8 encodings in
> RCPT: headers ?

Then applications that implement future ESMTP will need to be prepared
for UTF-8 labels to contain more than 63 octets.  This is not a
problem, because any application that can even think about using
non-ASCII labels is aware of IDNA, and therefore knows the definition of
internationalized label, and therefore knows that the maximum possible
label length depends on the encoding used.

Soobok Lee <lsb@postel.co.kr> wrote:

> They will find an utf8 label may have 168 octets, contrary to RFC1035.

There is no contradiction.  RFC 1035 says nothing about UTF-8 labels.
The RFC 1035 limit of 63 octets per label applies to the universe of
labels that RFC 1035 defined.  IDNA defines some new labels outside that
universe (each of which is equivalent to a label inside that universe,
for backward compatibility).  If you want to know the maximum possible
length of these new labels that were created by IDNA, don't bother
looking at RFC 1035, because it can't possibly tell you, because it
doesn't even know about the new labels.  Look at IDNA, which contains
the complete definition of internationalized label.

> When IDNA draft granted utf8 label use in application protocols,
> it is natural that it should have also specified utf8 label length
> restrictions.

It did, by defining internationalized label as anything that ToASCII
can be applied to without failing.  From this you can easily conclude
that internationalized labels, when encoded in UTF-8, can exceed 63
octets, but cannot exceed 63*4 octets.  A tight upper bound is trickier
to figure out, but you don't need it in practice.

> So, 1024 or 768 bytes are good.  But those utf8 FQDN cannot be put
> into single UDP packet of DNS response/query.  This will constrain
> future DNS protocol update efforts around utf8 supports in wire
> format.  Today's long iDNs may be one of the obstacles in the way to
> the effort.

That will indeed be an issue that any UTF-8 DNS protocol update will
need to address.

AMC