[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] comments on draft-ietf-idn-requirements-03.txt



As send by Keith Moore.

We will also speak to the list moderator on the policy of
idn@ops.ietf.org mailing list. Please be patient while we work this out.
Thanks.

-James Seng

> To: idn@ops.ietf.org
> Subject: comments on draft-ietf-idn-requirements-03.txt
> cc: moore@cs.utk.edu
> From: Keith Moore <moore@cs.utk.edu>
> Date: Mon, 31 Jul 2000 16:55:36 -0400
> Sender: moore@cs.utk.edu
> 
> > 1. Introduction
> >
> > At present, the encoding of Internet domain names is restricted to a
> > subset of 7-bit ASCII (ISO/IEC 646). HTML, XML, IMAP, FTP, and many
> > other text based items on the Internet have already been at least
> > partially internationalized. It is important for domain names to be
> > similarly internationalized or for an equivalent solution to be found.
> > This document assumes that the most effective solution involves putting
> > non-ASCII names inside some parts of the overall DNS system.
> 
> Since you've made that assumption, it is of course good to state it.
> However, this is not an appropriate constraint to impose on a solution
> to the IDN problem.  The IDN problem is being investigated because of
> the needs of human users, not because of any requirement at the DNS
> protocol level.  Users don't care about the bits on the wire that are
> transmitted to and from DNS; they care about the interchangability of
> DNS names at the application layer *and above*.  Focusing attention
> on the DNS layer diverts attention from the most thorny parts of the
> IDN problem - the user interfaces to applications where DNS names
> are entered and displayed, and the interfaces between different
> applications where DNS names are exchanged.
> 
> For any IDN work to effective may require changes to a many different
> pieces - DNS servers, operating systems, software libraries, applications
> protocols, and user interfaces - on many different platforms.  These
> pieces *must* be able to change independently from one another, and
> everything that was once working *must* be able to keep on working.
> This means that normal use of ASCII-only domain names must keep working;
> it also means that once use of IDNs starts working for some set of
> components, it should not cease to work when one of those components
> is upgraded.
> 
> This process will likely take many years, and it must be realized that
> IDN will not be universally available during that transition period.
> 
> > 1.1 Definitions and Conventions
> >
> > The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
> > "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
> > document are to be interpreted as described in [RFC2119].
> 
> The 2119 definitions seem inappropriate for a requirements document.
> 
> > 1.4 A multilayer model of the DNS function
> >
> > The DNS can be seen as a multilayer function:
> >
> > - The bottom layer is where the packets are passed across the Internet
> >   in a DNS query and a DNS response. At this level, what matters is
> >   the format and meaning of bits and octets in a DNS packet.
> >
> > - Above that is the "DNS service", created by an infrastructure of DNS
> >   servers, NS records that point to those DNS servers, that is
> >   pointed to by the root servers (listed in the "root cache file" on each DNS
> >   server, often called "named.cache". It is at this level that the
> >   statement "the DNS has a single root" [RFC2826] makes sense, but
> >   still, what are being transferred are octets, not characters.
> >
> > - Interfacing to the user is a service layer, often called "the resolver
> >   library", and often embedded in the operating system or system
> >   libraries of the client machines. It is at the top of this layer that
> >   the API calls commonly known as "gethostbyname" and "gethostbyaddress"
> >   reside.  These calls are modified to support IPv6 [RFC2553]. A
> >   conceptually similar layer exists in authoritative DNS servers,
> >   comprising the parts that generate "meaningful" strings in DNS files.
> >   Due to the popularity of the "master file" format, this layer often
> >   exists only in the administrative routines of the service maintainers.
> 
> I'm not at all sure what is meant by "administrative routines of the service
> maintainers"... could you reword or elaborate?
> 
> > - The user of this layer (resolver library) is the application programs
> >   that use the DNS, such as mailers, mail servers, Web clients, Web
> >   servers, Web caches, IRC clients, FTP clients, distributed file
> >   systems, distributed databases, and almost all other applications on
> >   TCP/IP.
> 
> There are several more layers of interest, and they are the layers
> which are most impacted by IDN changes.  As such they really need
> to be considered in your taxonomy.
> 
> - - the input routines used by applications.  often these are supplied by,
>   or part of, the operating system which supports the applications.
>   These layers may or may not suppport internationalization, or they
>   may support "localization".  different instances of the same platform
>   may use different CESs to represent characters.
> 
> - - the display routines used by applications.  similar issues apply as for
>   input methods.  however, it cannot be assumed that the input routines
>   and the display routines will change together or that they will support
>   the same formats for representation of IDNs - in general, this will
>   not be the case.
> 
> - - the means used by applications to exchange DNS names with one another
>   without intervening human input - for instance, drag-and-drop, or
>   text-based cut-and-paste.
> 
> - - the human users which need to be able to read, transcribe, and
>   input IDNs.  arguably their needs include not only reading and
>   typing but also spoken input and audible output.
> 
> - - the means used by human users to exchange domain names with one another..
> 
> >
> > Graphically, one can illustrate it like this:
> >
> > +---------------+                            +---------------------+
> > | Application   |                            | (Base data)         |
> > +---------------+                            +---------------------+
> >       |  Application service interface                 |
> >       |  For ex. GethostbyXXXX interface               | (no standard)
> > +---------------+                            +---------------------+
> > | Resolver      |                            | Auth DNS server     |
> > +---------------+                            +---------------------+
> >       |     <-----   DNS service interface   ----->    |
> > +------------------------------------------------------------------+
> > |  DNS service                                                     |
> > |  +-----------------------+         +--------------------+        |
> > |  | Forwarding DNS server |         | Caching DNS server |        |
> > |  +-----------------------+         +--------------------+        |
> > |                                                                  |
> > |                 +-------------------------+                      |
> > |                 | Parent-zone DNS servers |                      |
> > |                 +-------------------------+                      |
> > |                                                                  |
> > |                 +-------------------------+                      |
> > |                 | Root DNS servers        |                      |
> > |                 +-------------------------+                      |
> > |                                                                  |
> > +------------------------------------------------------------------+
> 
> The picture omits the interaction between applications and users,
> and between multiple applications.  these could be illustrated in
> another picture:
> 
>                         (spoken / heard)
>    human 1 <------------------------------------------> human 2
>     |   ^ \                                            /  ^  |
>     |   |  \   (written)                (read)        /   |  |
>     |   |   ----------------> paper >-----------------    |  |
>     |   |                                                 |  |
>     v   |                                                 |  v
> +-------------------+     (cut and paste)        +-------------------+
> |  host text input  |<-------------------------->| host text input   |
> |    and output     |                            |    and output     |
> +-------------------+                            +-------------------+
>     |   ^                                                ^        |
>     v   |                     direct                     |        v
> +---------------+  (interface between applications)  +---------------+
> | Application 1 |<---------------------------------->| Application 2 |
> +---------------+                                    +---------------+
> 
> while it's generally true that most of these interfaces cannot be
> changed or specified by IETF, they are likely to be affected -
> and it is important to consider the effects on these interfaces
> when evaluating a proposal for interoperability.  I claim that
> these interfaces are the ones which are most important.
> 
> > 1.5 Service model of the DNS
> >
> > The Domain Name Service is used for multiple purposes, each of which is
> > characterized by what it puts into the system (the query) and what it
> > expects as a result (the reply).
> >
> > The most used ones in the current DNS are:
> >
> > - Hostname-to-address service (A, AAAA, A6): Enter a hostname, and get
> >   back an IPv4 or IPv6 address.
> >
> > - Hostname-to-Mail server service (MX): As above, but the expected
> >   return value is a hostname and a priority for SMTP servers.
> >
> > - Address-to-hostname service (PTR): Enter an IPv4 or IPv6 address (in
> >   in-addr.arpa or ip6.int form respectively) and get back a hostname.
> >
> > - Domain delegation service (NS). Enter a domain name and get back
> >   nameserver records (designated hosts who provides authoritive
> >   nameservice) for the domain.
> >
> > New services are being defined, either as entirely new services (IPv6 to
> > hostname mapping using binary labels) or as embellishments to other
> > services (DNSSEC returning information about whether a given DNS service
> > is performed securely or not).
> >
> > These services exist, conceptually, at the Application/Resolver
> > interface, NOT at the DNS-service interface.
> 
> I'm not sure what this statement means or what it implies.  It is
> not immediately clear to me that the services listed above are
> transparent to lower layers.  NATs in particular make assumptions
> about the semantics of address lookups and inverse adderss lookups.
> DNS servers also treat different kinds of queries 'specially' in
> that they return different 'additional information' depending on
> the query type.
> 
> > This document attempts to
> > set requirements for an equivalent of the "used services" given above,
> > where "hostname" is replaced by "Internationalized Domain Name". This
> > doesn't preclude the fact that IDN should work with any kind of DNS
> > queries.  IDN is a new service. Since existing protocols like SMTP or
> > HTTP use the old service, it is a matter of great concern how the new
> > and old services work together, and how other protocols can take
> > advantage of the new service.
> 
> the last point could use some more elaboration or emphasis.
> perhaps it would help just to put it in a separate paragraph?
> 
> > 2.1 Compatibility and Interoperability
> >
> > [1] The DNS is essential to the entire Internet. Therefore, the service
> > MUST NOT damage present DNS protocol interoperability.
> 
> good.
> 
> > It MUST make the
> > minimum number of changes to existing protocols on all layers of the
> > stack.
> 
> I wouldn't state this as a requirement.  rather, I would state that
> the requirement is that the IDN system be incrementally deployable
> with minimum disruption to operational services.  a system that met
> these requirements would be superior to one which required only few
> changes but which was quite disruptive to deploy or which required
> massive simultaneous deployment.  (a 'flag day')
> 
> > It MUST continue to allow any system anywhere to resolve any
> > internationalized domain name.
> 
> not sure what this means.  in general, systems cannot resolve IDNs now,
> since they are not even defined yet.  what does it mean for them to
> 'continue' to be able to do so?  if the goal is to maintain backward
> compatibility with pre-standard IDN systems, this should be a separate
> goal and it should be stated more clearly and less strongly. of course
> it is highly desirable to make the transition an easy one but this
> concern should not be paramount.
> 
> > [2] The service MUST preserve the basic concept and facilities of domain
> > names as described in [RFC1034]. It MUST maintain a single, global,
> > universal, and consistent hierarchical namespace.
> 
> yes.
> 
> > [2.5] The DNS service layer (the packet formats that go on the wire)
> > MUST NOT limit the codepoints that can be used. This interface SHOULD
> > NOT assign meaning to name strings; the application service layer,
> > where "gethostbyname" et al reside, MAY constrain the name strings to
> > be used in certain services. (conflict)
> 
> I don't disagree with the goal, but neither do I see how the desire
> to implement IDNs imposes this as a requirement, or how failure to
> meet this requirement would be disruptive to DNS.  This appears to
> over-constrain the solution set.
> 
> if one were defining a new service on DNS, it is at least worth
> considering to use a reserved codepoint in a DNS label to indicate
> an IDN label (as opposed to an ASCII label).
> 
> > [3] The same name resolution request MUST generate the same response,
> > regardless of the location or localization settings in the resolver, in
> > the master server, and in any slave servers involved in the resolution
> > process.
> 
> "MUST generate the same response" modulo error conditions - obviously
> if a slave doesn't have current data and/or is disconnected from the net
> it will not generate the same response as the master server.  what you
> don't want is for two different servers to generate conflicting responses.
> (either one can report an error, either one can have stale data as long
> as it's within the TTL, but one server should not "successfully" return
> X while the other "successfully" returns Y).
> 
> the other caveat is that this applies to any combination of "new"
> (upgraded to support IDN) and "old" (not upgraded) servers.
> 
> > [4] The protocol SHOULD allow creation of caching servers that do
> > not understand the charset in which a request or response is encoded.
> > The caching server SHOULD perform correctly for IDN as well as for
> > current domain names (without the authoritative bit) as the master
> > server would have if presented with the same request.
> 
> this presumes that the request or responses will be charset-tagged;
> not necessarily a good idea.
> 
> but I would state this more strongly - the protocol should allow
> existing DNS resolvers and caches to do reasonable things with
> IDN RRs.  you can expect IDN users to upgrade authoritative servers
> when they start using IDNs but you can't reasonably expect every cache
> to get upgraded to use IDNs before people start using them.
> 
> > [5] A caching server MUST NOT return data in response to a query that
> > would not have been returned if the same query had been presented to an
> > authoritative server. This applies fully for the cases when:
> >
> > - The caching server does not know about IDN
> > - The caching server implements the whole specification
> > - The caching server implements a valid subset of the specification
> 
> [6] missing?
> 
> > [7] The service MAY modify the DNS protocol [RFC1035] and other related
> > work undertaken by the [DNSEXT] WG. However, these changes SHOULD be as
> > small as possible and any changes SHOULD be coordinated with the
> > [DNSEXT] WG.
> 
> strongly recommend that any changes should be made by DNSEXT, or by a
> group chartered by IESG for this purpose, not by this group.
> this group can write requirements and perhaps eventually architecture
> specification, not do the protocol design.
> 
> > [8] The protocol supporting the service SHOULD be as simple as possible
> > from the user's perspective. Ideally, users SHOULD NOT realize that IDN
> > was added on to the existing DNS.
> 
> I don't agree with this as stated.  Users will almost certainly have
> to be aware of IDN at some level, if only so that they can realize
> when they can and cannot use an IDN.  (they won't be able to use them
> everywhere immediately)
> 
> In general, users don't care about the DNS protocol now, and shouldn't
> have to with IDN.  Users who maintain master files will have to care
> about it to some degree; it will probably make their lives slightly more
> complicated.
> 
> I think this needs rewording or clarification.  It's trying
> to take a statement about user needs and make a conclusion about
> DNS protocol complexity.  Simple protocols are generally good, but
> to state this as a requirement is not justified.
> 
> > [10] The best solution is one that maintains maximum feasible
> > compatibility with current DNS standards as long as it meets the other
> > requirements in this document.
> 
> I do not accept the new requirements as more important than maximum
> feasible compatibility.  The best solution is one that (a) provides
> effective IDN in the long term and (b) is universally deployable
> (or nearly so) without much disruption in a manner that does not
> fragment DNS space.
> 
> > 2.2 Internationalization
> >
> > [11] Internationalized characters MUST be allowed to be represented and
> > used in DNS names and records. The protocol MUST specify what charset is
> > used when resolving domain names and how characters are encoded in DNS
> > records.
> 
> this could be taken two ways - certainly you want the CES to be unambiguous.
> but this statement could be read to require that the protocol support
> charset tagging, and I don't think that follows.  (nor do I think that
> is what you meant, given the following statement)
> 
> > [12] This document RECOMMENDS Unicode only. If multiple charsets are
> > allowed, each charset MUST be tagged and conform to [RFC2277].
> >
> > [12.5] IDN MUST NOT return illegal code points in responses, SHOULD
> > reject queries with illegal codepoints. (one request to add; one request
> > to remove)
> 
> what is an illegal codepoint?
> does this mean "illegal codepoint" as defined by unicode?
> 
> > [13] CES(s) chosen SHOULD NOT encode ASCII characters differently
> > depending on the other characters in the string. In other words, unless
> > IDN names are identified and coded differently from ASCII-only ones,
> > characters in the ASCII set SHOULD remain as specified in [US-ASCII]
> > (one request to remove).
> 
> I don't think this is stated right.  I think the requirement is to maintain
> DNS protocol compatibility at all levels for ASCII names and the current
> query and response types.  but if the DNS protocol were extended with
> different query types or options or whatever, those extensions could have
> a different representation for ASCII names.
> 
> > [14] The protocol SHOULD NOT invent a new CCS for the purpose of IDN
> > only and SHOULD use existing CES. The charset(s) chosen SHOULD also be
> > non-ambiguous.
> 
> not clear where these requirements come from - it appears to overconstrain
> the solution space.  ideally of course, we wouldn't need to invent a
> new CCS or CES, but there might be advantages in at least doing a different
> CES.
> 
> the word 'non-ambiguous' is ambiguous.
> non-ambiguous in what way?
> 
> > [15] The protocol SHOULD NOT make any assumptions about the location in
> > a domain name where internationalization might appear. In other words,
> > it SHOULD NOT differentiate between any part of a domain name because
> > this MAY impose restrictions on future internationalization efforts.
> 
> good. (though it's not appropriate to use capitalized MAY here.)
> 
> > [16] The protocol also SHOULD NOT make any localized restrictions in the
> > protocol. For example, an IDN implementation which only allows domain
> > names to use a single local script would immediately restrict
> > multinational organization.
> 
> I would state this more broadly - interpretation of an IDN MUST be the
> same from any point on the Internet which supports IDNs.
> 
> (of course, this doesn't imply that an Italian keyboard must be able
> to input Korean IDNs...)
> 
> > [17] While there are a wide range of devices that use the DNS and a wide
> > range of characteristics of international scripts and methods of
> > domain name input and display, IDN is only concerned with the
> > protocol. Therefore, there MUST be a single way of encoding an
> > internationalized domain name within the DNS.
> 
> this says two different things.  I agree with the latter statement, but
> I don't agree than IDN can only be concerned with the protocol.  This
> misses the real difficulty in implementing IDNs.  Now it's true that
> all that IETF can specify is the DNS protocol, but the design must take
> higher layers into consideration if it is to be successful.
> 
> > [18] The protocol SHOULD NOT place any restrictions on the
> > application service layer. It SHOULD only specify changes in the DNS
> > service layer and within the DNS itself.
> 
> again, this is misleading.  IDN is going to have to at least make
> some assumptions about higher levels (which end up being constraints)
> if it is to be successful.
> 
> > 2.4 Canonicalization
> >
> > Matching rules are a complicated process for IDN. Canonicalization
> > of characters MUST follow precise and predictable rules to ensure
> > consistency. [CHARREQ] is RECOMMENDED as a guide on canonicalization.
> >
> > The DNS has to match a host name in a request with a host name held
> > in one or more zones. It also needs to sort names into order. It is
> > expected that some sort of canonicalization algorithm will be used as
> > the first step of this process. This section discusses some of the
> > properties which will be REQUIRED of that algorithm.
> >
> > [22] To achieve interoperability, canonicalization MUST be done at a
> > single well-defined place in the DNS resolution process.
> 
> not clear.  the ultimate solution is likely to employ several kinds of
> canonicalization.  e.g.
> 
> - - conversion of client-specific charset (from input) into Unicode
> - - canonicalization of Unicode to get a unique on-the-wire representation
> - - locale-specific canonicalization - to meet the requirements of
>   a specific locale within a DNS zone.
> 
> these might each need to be done at a different place.
> 
> > The protocol
> > MUST specify canonicalization; it MUST specify exactly where in the
> > DNS that canonicalization happens and does not happen; it MUST specify
> > how additions to ISO 10646 will affect the stability of the DNS and
> > the amount of work done on the root DNS servers.
> >
> > [23] The canonicalization algorithm MAY specify operations for case,
> > ligature, and punctuation folding.
> 
> and these might be locale-specific.
> 
> > [24] In order to retain backwards compatibility with the current DNS,
> > the service MUST retain the case-insensitive comparison for [US-ASCII]
> > as specified in [RFC1035]. For example, Latin capital letter A (U+0041)
> > MUST match Latin small letter a (U+0061). [UTR21] describes some of
> > the issues with case mapping. Case-insensitivity for non [US-ASCII]
> > MUST be discussed in the protocol proposal.
> >
> > [25] Case folding MUST be locale independent. For example, Latin
> > capital letter I (U+0049) case folded to lower case in the Turkish
> > context will become Latin small letter dotless i (U+0131). But in the
> > English context, it will become Latin small letter i (U+0069).
> 
> not clear whether you can make this work.  given that case folding
> is implemented in current DNS servers - don't see why you cannot
> do locale-dependent case folding in IDN servers.  however you want
> to be careful that caches do not apply locale-specific case folding.
> 
> > [26] If other canonicalization is done, it MUST be done before the
> > domain name is resolved. Further, the canonicalization MUST be easily
> > upgradable as new languages and writing systems are added.
> 
> not clear that this is appropriate - the obvious place to do locale-
> specific canonicalization is on the authoritative servers for that zone.
> 
> > [27] Any conversion (case, ligature folding, punctuation folding, etc)
> > from what the user enters into a client to what the client asks for
> > resolution MUST be done identically on any request from any client.
> 
> .. for a particular zone.  not clear that you want to make the same
> constraints apply across all zones.  but in general every query should
> get the same result (if successful) from any place on the internet.
> and this applies to a lot more aspects of the system than just
> canonicalization.
> 
> also, if client A uses charset X and client B uses charset Y, it's
> hard to impose the constraint that the conversions should be the same -
> there will necessarily be a client-specific conversion between X or
> Y and the charset used by IDN.
> 
> > [30] If the charset can be normalized, then it SHOULD be normalized
> > before it is used in IDN. Normalization SHOULD follow [UTR15].
> > (conflict)
> >
> > [31] The protocol SHOULD avoid inventing a new normalization form
> > provided a technically sufficient one is available.
> >
> > 2.5 Operational Issues
> >
> > [32] Zone files SHOULD remain easily editable.
> 
> by what kinds of tools?  do these tools need to use the same CCS
> and CES as IDN uses?
> 
> > [33] An IDN-capable resolver or server SHALL NOT generate more traffic
> > than a non-IDN-capable resolver or server would when resolving an
> > ASCII-only domain name.  The amount of traffic generated when resolving
> > an IDN SHALL be similar to that generated when resolving an ASCII-only
> > name.
> 
> SHALL NOT seems too strong; a modest increase in traffic (due to
> larger storage requirements for labels, for instance) should
> be acceptable.  the latter statement is better, and seems sufficient
> by itself.
> 
> > [34] The service SHOULD NOT add new centralized administration for the
> > DNS. A domain administrator SHOULD be able to create internationalized
> > names as easily as adding current domain names.
> 
> mumble.  with similar ease or difficulty, not "as easily".
> 
> > [35] Within a single zone, the zone manager MUST be able to define
> > equivalence rules that suit the purpose of the zone, such as, but not
> > limited to, and not necessarily, non-ASCII case folding, Unicode
> > normalizations (if Unicode is chosen), Cyrillic/Greek/Latin folding, or
> > traditional/simplified Chinese equivalence. Such defined equivalences
> > MUST NOT remove equivalences that are assumed by (old or
> > local-rule-ignorant) caches.
> 
> you need to distinguish zone-imposed equivalence rules from protocol-imposed
> equivalence rules.  they will probably get handled differently.
> 
> caches that, for whatever reason, assume equivalence rules other than
> those imposed by the protocol, will probably break things.
> to expect them not to do so is not a reasonable requirement as stated.
> 
> - -Keith
> 
> - ------- end of forwarded message -------
> 
> ------- End of Forwarded Message