[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] Length limits on a domain name
I cannot remember that we really have talked about what length we
require in a domain name. But I think it is important we do.
A domain names is composed of labels. The current DNS specification
limits a label to 63 bytes and a domain name to 255.
The 63 byte limit is because the current protocol does not allow
longer labels, and the total 255 is just a selected value.
In the current protocol we get using ASCII:
63 character per label and 255 characters total in a domain name.
Now lets move to IDNs.
What are we going to require?
If we say: An IDN can have up to 63 bytes in a label an 255 bytes total,
we may get some bothersome results.
With 63 bytes I can encode:
- 63 characters in ASCII.
- 63 characters in ISO 8859-1
- 31 characters in ISO 8859-1 using UTF-8
- 21 characters in UCS-2 using UTF-8
- 31 characters in UCS-2 using UCS-2
That is, a different maximum number of characters depending on
what encoding we use in a label!
Now assume we have two formats allowed: UTF-8 and an ACE (ASCII Compatible
Encoding) like for example CIDNUC. What do we get? Well, some names can
have more characters when encoded using UTF-8 and others when encoded
using the ACE. So some might use the ACE format to get a name as long as
I would expect some people to think it is unfair that those needing only
ASCII characters can have IDNs with 63 character labels while those
using Greek might only be allowed 31. Giving more than one format
accepted by DNS, I expect some people will use the one giving most
characters into a name (or use them to trigger software malfunction or
One solution is to say: we allow only ONE encoding in the DNS protocol.
But this would mean that if we want to support old software that will
fail when getting non-ASCII, we must use an ACE.
But by only allowing an ACE we are stuck in the ASCII world forever.
The ACE is for backward compatibility. It is a transition mechanism
to support old software that cannot handle non-ASCII in domain names.
Should not the requirement on length of IDNs be:
An IDN may have a maximum of 63 CHARACTERS per label and a total
length of 255 CHARACTERS?
A CHARACTER can be defined as the number of UCS-4 characters that is
used when the name uses when it is in UCS normalised using Unicode
normalisation form C. This would give the DNS implementors a maximum
limit to adopt their data structures to. (adding that remaining combining
characters should not be counted would make it possible to make names taking
much more space).
As the above requirement will get names that does not fit into the
current DNS protocol, it will have to be extended to fullfill this
requirement. As this will result in pre IDN DNS software not being able
to handle very long IDNs, we will have to live with for many years
that you have to use shorter names than what might be wanted, if you
want the entire world to be able to lookup the name.
Note: even with the above requirement, people may use the ACE instead of
the non-ASCII encoding to get more character into a name. This would be
especially interesting if the non-ASCII encoding is not very compact
like UTF-8 for some names and the ACE is, and we want the name to
work with old software having the 63 byte limit.
While I expect the ACE to be visible for the user now and then, and by
many be recognised as a mechanism to get IDNs to work with old software,
I expect the non-ASCII formats used in the protocol to be presented to the
user as the IDN itself using non-ASCII characters.
As I know the ACE is a backward compatiblity, I can understand if it
does not handle all types of IDNs and results in a failure in those cases.
For example when using a very long name.
But I can not understand or accept that if I type in my name using the
normal non-ASCII characters, it will not work, but if I type in the same
name using ACE, it will.
What shall we do about this?