[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] length restrictions on IDN label



Soobok Lee <lsb@postel.co.kr> wrote:

> My focus was on whether the "UTF-8 labels in ESMTP sessions" are
> legitimate internationalized hostnames (labels) or not at the future
> and at the present time, since IDNA section 6.1 allows utf8 encodings
> of transmitted labels.  Does this section seems to propose changes in
> hostname rules ?

Nothing in section 6.1 reduces the requirement of section 3 item 2:
Whenever a domain name is put into an IDN-unaware domain name slot, it
MUST contain only ASCII characters.

At the present time, all domain name slots in ESMTP are IDN-unaware
(because they predate IDNA).  Therefore, all domain names put into
those slots must contain only ASCII characters.

Section 6.1 is not aimed primarily at this situation, but it does
have implications for this situation.  If ESMTP were to use EBCDIC or
UTF-16 for some slots (which isn't very likely), then domain names
being put into those slots would contain only ASCII characters (that
is, characters from the ASCII repertoire) as required by section 3
item 2, but would need to be encoded using EBCDIC or UTF-16 rather than
the ASCII encoding.  (The name ASCII is old, from the days when people
weren't so careful to distinguish between repertoires and encodings).

If a future revision/extension of ESMTP defines some IDN-aware slots,
then the implications of section 6.1 become more interesting.  If those
slots use, say, UTF-8, then IDNs containing non-ASCII characters could
be put into those slots, and they would need to be encoded using UTF-8.

> But label length restrictions are not added per application protocol
> basis, rather they should be added at lower level like the DNS (and so
> IDNA ) as has been done at RFC1035.  How do you think about that?

RFC 1035 defined the length restriction in terms of the form of the
label that is used in DNS protocol messages.  That form must be no more
than 63 octets.

In addition to the DNS protocol, RFC 1035 also defined the master file
format.  The domain names in master files are protocol elements, because
the master file format is explicitly intended to be machine readable.
Labels in master files can be up to 63*4 octets long.  This fact is
never mentioned in RFC 1035, it is left as an exercise for the reader.

In a completely analogous fashion, IDNA defines the length restriction
of internationalized labels in terms of the ASCII form.  That form must
contain no more than 63 code points.  The length of labels in other
forms is currently left as an exercise for the reader, though perhaps
we will want to add a hint that there can never be more than 63 Unicode
code points in an internationalized label.  IDNA cannot specify the
length restriction in terms of octets, because that depends on the
encoding used, and IDNA does not favor any encoding over any other
encoding.

AMC