[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] My draft for internationalisation of DNS



Hi

As I was requested in one of the replys to my comments on this list,
to write an internet draft, I have tried to do that.
It is my first try at writing an internet
draft, so I am sure there is more work to be done before it is ready.

Attached is the basics in a draft specifying
how I think internationalisation of DNS could be done, from all
the discussions and suggestions we have had on the list.
Hopefully it matches what many of us want and have suggested.
Maybe it could be the base of one of the drafts/RFCs that is going to
be the result of this working group.

   Dan
Internet Draft                                     Dan Oscarsson
draft-oscarsson-idn-i18ndns.txt                    Telia ProSoft
Updates: RFC 2181, 1035, 1034, 2535
February 2000                              
Expires August 2000

         Iternationalisation of the Domain Name Service

Status of this memo

   This document is an Internet-Draft and is in full conformance with all
   provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering Task
   Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.


Abstract

   There is a very strong world-wide desire to use characters other than
   ASCII in  the DNS, especially in domain names. Domain names have become
   the equivalent of business or product names for many services on the
   Internet, so there is a need to make them usable by people whose native
   scripts are not representable by ASCII.

   This document updates the Domain Name System standard (DNS) [RFC1035] and
   specifies how international characters are handled. It is completely
   compatible with the current DNS (RFC 1034,1035, 2181, 2535 etc).



1. Introduction

   There is an immediate need of using international characters (non-ascii)
   in DNS. This means that DNS cannot be extended as this would take
   too long time, instead the current ASCII only handling need to
   be extended to non-ASCII in a way that can be used without updating
   current software.

   The basic handling of character data in DNS have several properties
   that need to be preserved:
   - The DNS itself places only one restriction on the particular labels
     that can be used to identify resource records.  That one restriction
     relates to the length of the label and the full name.  The length of
     any one label is limited to between 1 and 63 octets.  A full domain
     name is limited to 255 octets (including the separators).
     [RFC 2181]
   - Any binary string whatever can be used as the label of any
     resource record.  Similarly, any binary string can serve as the value
     of any record that includes a domain name as some or all of its value
     (SOA, NS, MX, PTR, CNAME, and any others that may be added).
     Implementations of the DNS protocols must not place any restrictions
     on the labels that can be used.  In particular, DNS servers must not
     refuse to serve a zone because it contains labels that might not be
     acceptable to some DNS client programs.
     [RFC 2181]
   - Names must be compared with case-insensitivity.
     [RFC1035]
   - The original case should be preserved when possible as data is entered
     into the system. This also implies that responses should preserve case
     when possible.
     [RFC1035]
   - The characters in the ASCII character set must still be encoded
     as ASCII.

   This document specifies the update needed of the DNS protocol, user
   interface issues and the effect of other protocols.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].


2. The DNS Protocol

   The DNS protocol is used when communicating between DNS servers and
   other DNS servers or DNS clients. User interface issues like the format
   of zone files or how to enter or display domain names are not part
   of the protocol.

   The update of the protocol defined here can be used immediately as
   it is fully compatible with the DNS of today.

2.1 Internationalisation aware software

   Internationalisation aware DNS software (i18n aware) is software the
   handles the rules for handling international text as defined here. Only
   i18n aware software will get all requirements fullfilled. Non-i18n aware
   will lose the case preserving requirement. Also only i18n aware
   software may perform zone transfers.

   I18n aware software identifies itself in a query or a response by
   setting the IN bit in the DNS query/response format header. This
   bit is the last unallocated bit in the header.

                                           1  1  1  1  1  1
             0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
            |                      ID                       |
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
            |QR|   Opcode  |AA|TC|RD|RA|IN|AD|CD|   RCODE   |
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
            |                    QDCOUNT                    |
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
            |                    ANCOUNT                    |
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
            |                    NSCOUNT                    |
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
            |                    ARCOUNT                    |
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

   This bit is zero in old servers and resolvers. Thus they identify
   themselves as non-i18n aware.

   I18n aware software MUST set the IN bit in both queries and responses.



2.2 Character data

   Character data need to be able to represent as much as possible of
   the characters in the world as well as being compatible with ASCII.
   It must also be well defined so that it can easily be compared
   in both case and case-insensitive matching and should be compact as
   only 63 octets is available without an extention of the protocol.

   Therefore character data MUST:
   - Be ISO 10646 (UCS) [UCS].
   - Be normalised using form KC as defined in Unicode technical
     report #15 [UTR15].
     If the character data is in a text string that is not used in
     character matching, normalisation form C of [UTR15] may be used.
   - Encoded using UTF-8 [RFC2279].
   
   Case-insensitive matching MUST:
   - Be done by folding the case to lower case using the CaseFolding.txt
     mapping as defined in Unicode technical report #21 [UTR21] and
     then comparing the data.

   Note: Normalisation form KC results in compatible characters
   merged into one (for example Greek A to Latin A). This results
   in less user confusion (as the Greek A looks like Latin A and
   many will assume it is a Latin A).

   Note: Case folding to lower case using UTR#21 is not perfect. For
   example in Turkey I is lower cased into a dotless i, but UTR#21
   does it in the old ASCII way (I -> i). This way we get a well
   defined lower caseing that can be used in matching, but it will
   not be correct with all languages local rules.

2.3 Rules for character data in queries and responses

   There is only one area which non-i18n aware software cannot
   handle: case-insensitive matching of i18n data.
   Because of this, the IN bit is defined and character data
   MUST be handled as follows:

   - In all queries all character data that will be used by the DNS-server
     to lookup records, MUST be in lower case.
   - A request containing an update of the data in the database of the
     DNS-server (for example a DNS update) MUST send data in the
     original case.
   - A DNS-server MUST not send a zone transfer, if the server is
     i18n aware and the client is not.
   - A DNS-server getting a request from an i18n aware clinet MUST
     return data using original case, just like old software does.
   - A is8n aware DNS-server getting a request from a non-i18n aware
     client MUST return all character data that can be used in character
     matching, in lower case.

   The result of the above rules results in that old non-i18n aware
   DNS software only gets lower cased character data so that it can
   still perform character data matching. I18n aware software will
   get data as before, preserving case, but can still optimise
   character matching as all normal queries will have their data
   lower cased.



3. Characters allowed in domain names

   The DNS protocol do not place any restriction on characters used in
   a domain name. However applications that make use of DNS
   data may have restrictions imposed on what particular values are
   acceptable in their environment. If the client has such restrictions,
   it is solely responsible for validating the data from the DNS to ensure
   that it conforms before it makes any use of that data. [RFC 2181]

   For example domains, hosts and e-mail addresses are represented in DNS
   and may have different rules.

   As the whole idea of internationalisation of DNS is to get domain names
   with non-ascii, the original recommendation in DNS [RFC 1035] for
   host/domain names needs to be updated.

   It is recommended that domains, hosts and e-mail addresses all are
   extended to allow all letters, digits and some separators of UCS.

   [ Should the recommended set based on the Unicode character properties
     be included here? ]



4. User interface issues

   Locally on a system or in a user interface a different character set
   then the one defined to be used in the DNS protocol. Therefore must
   software map between the local character set and the character set of
   the protocol, so that human beings can understand it.

   This means that a zone file that is edited in a text editor by a person
   before being loaded into a DNS server must be allowed to me in the local
   character set. Software may not assume that the user can edit text
   encoded in UTF-8. A zone file transmitted between DNS software that
   is not handled by a human, can be transmitted using any format.

   When character data is presented to a human or entered by a human,
   software must, as good as possible, present it using local character
   set and allow it to be entered using the local character set.
   It is the resposibility of the software to convert between the local
   character set and the one used in the protocol, not the human.



5. Effect on other protocols

   As now a domain name may include non-ascii many other protocols
   that include domain names need to be updated. For example
   are SMTP, HTTP and URLs.

   In many protocols domain names are used in headers. It is recommended
   that they are updated to be encoded using UCS normalised using form C
   or KC of UTR#15 and encoded using UTF-8. And the same format for
   other character data of the protocols. This way ugly things like
   quoted-printable can be obsoleted.

   We can now expect users to want to have e-mail addresses with
   non-ascii both before and after the @-sign.

   Software need to be updated to follow the user interface recommendations
   given above, so that a human will see the characters in their local
   character set, if possible.

6. Security Considerations

   As always with data, if software does not check for data that can
   be a problem, security may be affected. As now more characters
   than ASCII is allowed, software only expecting ASCII and with no checks
   may now get security problems.

7. References

   [RFC1034]   Mockapetris, P., "Domain Names - Concepts and Facilities",
               STD 13, RFC 1034, November 1987.

   [RFC1035]   Mockapetris, P., "Domain Names - Implementation and
               Specification", STD 13, RFC 1035, November 1987.

   [RFC2279]   F. Yergeau, "UTF-8, a transformation format of 
               ISO 10646," RFC 2279, Alis Technologies, January 1998.


   [RFC 2181]  Elz, R. and R. Bush, "Clarifications to the DNS
               Specification", RFC 2181, July 1997.

   [RFC 2535]  D. Eastlake, "Domain Name System Security Extensions".
               RFC 2535, March 1999.

   [RFC2119]   Scott Bradner, "Key words for use in RFCs to Indicate
               Requirement Levels", March 1997, RFC 2119.

   [UTR15]     Mark Davis and Martin Duerst, "Unicode Normalization Forms",
               Unicode Technical Report #15,
               <http://www.unicode.org/unicode/reports/tr15/>.

   [Unicode3]  The Unicode Consortium, "The Unicode Standard -- Version
               3.0", ISBN 0-201-61633-5. Described at
               <http://www.unicode.org/unicode/standard/versions/Unicode3.0.html>.

   [UnicodeData] The Unicode Character Database, 
                <ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt>.
                 The database is described in
                <ftp://ftp.unicode.org/Public/UNIDATA/UnicodeCharacterDatabase.html>.



8. Acknowledgements

   Paul Hoffman: draft-hoffman-idn-cidnuc-00.txt
   Stuart Kwan, James Gilroy: draft-skwan-utf8-dns-02.txt
   Kent Karlsson: Draft on domain name internationalisation.

   Discussions by the members of the IDN working group.



9. Author's Address

   Dan Oscarsson
   Telia ProSoft AB
   Box 85
   201 20 Malmö
   Sweden

   E-mail: Dan.Oscarsson@trab.se