[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] length restrictions on IDN label



[ When i read IDNA draft today, I still can't find
  the answer from it for the following question about IDN label length.
 If the following issue is already addressed in the draft, please correct me. ]


 I have a punycode label of length 63 octets:
  L1: zq--o39AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
 
 L2=ToUnicode(L1) produces: U+AC00 x 56 times ( Hangul "KA" repeated 56 times)

  L2:
U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 
U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 
U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 
U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 
U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 
U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00

 But this L2 can be encoded in various unicode/legacy encodings into
  various lengths of octets:

  UTF8 : 3 x 56 = 168 octets
  UCS2 : 2 x 56 = 112 octets
  UCS4 : 4 x 56 = 224 octets
  KSX1001/EUC-KR : 2 x 56 = 112 octets 
 
 These encodings produce labels longer than  63 octets 
 
 Moreover, each ACE label of valid (<256 octets) ACE-form FQDN IDN  may be
  converted into below-63-octets valid UTF8 labels, while the cumulative sum
  of the length of each UTF8 label of the FQDN IDN may exceed 256 octets 
  limits.
  
 Many internet applications impose/assumes  the 63-octets-limit of label lengths.
 IF this assumption is violated, the label will be regarded as invalid
 labels, and produce unpredictable errors by some implementations.
 
 From implementators' point of view, more precise specificiation is needed
 about whether IDN label/FQDN has *NEW* length restrictions in various char encodings,
 if IDNA tries to extend the character repertoires of allowable characters.

 The above case is very rare, but in any cases, the implementors have practical
 security-related need to impose some limits on the iDN lables in non-ACE encodings.
 (for example, to avoid buffer overflow errors due to expanded ToUnicode labels)

 Cheers,

Soobok Lee