[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: An argument against multiple character sets



I don't think anyone is suggesting that a computer shouldn't be able to use
whatever character set it desires as its internal representation, or as the
representation in zone files.

This is however independent of the question of what character set(s) are to
be used "on the wire" by the DNS protocol (which is I think what we are
specifying requirements for).

I'm not sure that the use of one character set or multiple character sets is
a requirement, but if resolving the issue one way or the other makes other
requirements easier then we should do that now.

A fairly uncontroversial requirement is that if we specify multiple
character sets then the protocol must indicate the charset used by each DNS
label (how it does this is irrelevant for this discussion).

Some more reasons why multiple "on the wire" character sets won't work are:

* In order for a caching server to work it must be able to compare labels.
It can't do this unless it knows about all character sets which might be
used on the wire.  The difficulty of upgrading all caching servers will
effectively prevent new "on the wire" character sets from being added.  

If you can never add more character sets then why not just start with one
large character set which is a superset of all the characters you will ever
want to use?

* In order for DNSSEC to be able to authoritatively deny the existence of a
name there must be a "canonical charset" in which that name is to be
represented.  This would lead to a massive increase in complexity, and would
mean that some charsets would effectively be favoured over others.  (It
would also cause large arguments about which one should be the canonical
charset).

I suggest that we assume that only one (large) character set will be used on
the wire where this will help the discussions.  The only suitable character
set I know of is UNICODE.  Any other candidates?

Note that this discussion is independent of any politics which might result
from the many experimental DNS servers out there which have defined the
meaning of non-ASCII labels.  If we use UNICODE then there are more ways to
encode it than just UTF-8 in existing label types.  The way to encode the
character set is definitely not in the requirements.

Perhaps a sentence in the requirements like:

"In order to simplify the requirements we have assumed that UNICODE will be
used for the protocol on the wire.  A candidate protocol may use multiple
character sets if it meets all the other requirements in this document.  The
protocol should not constrain which character set(s) an implementation may
use for its user interface, or for the storage of records in a master file."

    Andy