[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] length restrictions on IDN label



On Sun, Oct 13, 2002 at 09:39:42AM -0700, Paul Hoffman / IMC wrote:
> At 3:56 PM +0900 10/13/02, Soobok Lee wrote:
> >[ When i read IDNA draft today, I still can't find
> >  the answer from it for the following question about IDN label length.
> > If the following issue is already addressed in the draft, please 
> >correct me. ]
> 
> It is indeed covered in the draft. The input to IDNA is code points, 
> not encoded characters. As you point out, different encodings give 
> different lengths for the same string. The only lengths that matter 
> are those that are already in STD 13.
> 
> > Many internet applications impose/assumes  the 63-octets-limit of 
> >label lengths.
> > IF this assumption is violated, the label will be regarded as invalid
> > labels, and produce unpredictable errors by some implementations.
> 
> Which Internet applications are you speaking of? Which encodings are 
> they using? As you pointed out, different encodings give different 
> lengths. Thus, no sensible application could assume a 63-octet length 
> if it deals with different encodings.

UTF8,EUC-KR etc are all ASCII compatible encoding/charset.
Applications don't need to give up/modify old 63-octets restrictions for
 LDH labels even in utf8 or euc-kr, because those encodings produce
the same octets string  as  pure  ASCII encoding does. That is,
in those ASCII compatible encoding of LDH chars, the number of codepoints and
the number of octets are equal, while they are not equal in encoding of
non-LDH chars like Hangul, CJK letters (the octet length is doubled or tripled).

> 
> > From implementators' point of view, more precise specificiation is needed
> > about whether IDN label/FQDN has *NEW* length restrictions in 
> >various char encodings,
> > if IDNA tries to extend the character repertoires of allowable characters.
> 
> It seems likely that most implementers can understand that they must 
> continue to follow the same rules that they always have for the 
> length of domain names and labels.

The unit of length restriction matters: # of code points or # of octets ?
That should be made clearer. RFC1035 uses "octets", not a character/code point.

I enclose related RFC1035 (STD13) sections here.


Mockapetris                                                     [Page 9]

RFC 1035        Domain Implementation and Specification    November 1987


(snip)

2.3.4. Size limits

Various objects and parameters in the DNS have size limits.  They are
listed below.  Some could be easily changed, others are more
fundamental.

labels          63 octets or less

names           255 octets or less

TTL             positive values of a signed 32 bit number.

UDP messages    512 octets or less

3. DOMAIN NAME SPACE AND RR DEFINITIONS

3.1. Name space definitions

Domain names in messages are expressed in terms of a sequence of labels.
Each label is represented as a one octet length field followed by that
number of octets.  Since every domain name ends with the null label of
the root, a domain name is terminated by a length byte of zero.  The
high order two bits of every length octet must be zero, and the
remaining six bits of the length field limit the label to 63 octets or
less.

To simplify implementations, the total length of a domain name (i.e.,
label octets and label length octets) is restricted to 255 octets or
less.

Although labels can contain any 8 bit values in octets that make up a
label, it is strongly recommended that labels follow the preferred
syntax described elsewhere in this memo, which is compatible with
existing host naming conventions.  Name servers and resolvers must
compare labels in a case-insensitive manner (i.e., A=a), assuming ASCII


[Soobok Lee]