[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] My draft for internationalisation of DNS



Somehow I disappeard from the list, so I had to get back on and
to read a lot from the archive again. Below is a response I sent
but did not get on the list.

   

Andy wrote (also related to Martin's comments):
>I think this is a good basis for an IDN protocol, and is almost exactly what
>I would have written up had I had the time.  A few comments:
>
>There are some broken DNS servers which don't zero the last flag bit (the
>one you have specified as the IN bit).  I suggest you use an EDNS option
>instead (see RFC 2671).  In addition, I can imagine various levels of IDN
>support (as more characters are added to ISO10646 for example) so a one bit
>flag may not be enough.

I wonder how the security extentions work as they also use some bits
that are specified to be zero in RFC1035?

But you are right, there can be more levels of IDN support. This was also
pointed out by Martin. I did not use EDNS because of the following
reasons:
- It should work with the current (that is now) DNS software.
- There should be no additional requests needed to be sent for i18n aware
  software.
  
EDNS, of what I can read, may result in an error message back from
the DNS-server forcing the client to resend the request using old
format. This will generate a lot of extra traffic and slow down things.

So, that is the reason for not using EDNS. But if EDNS could be used without
getting request rejected, it could be used. Looking at RFC1035 it
might be possible to put something in the additional part of a request,
without a rejection, but I do not know what current software does.

Anyway, to solve the problems with future extentions in the character set
and additional i18n handling in DNS, we could define it as follows:

The protocol as defined in my draft defines the valid set of characters
and rules as those defined by ISO 10646-1:2000/Unicode 3.0.
When new characters are added to ISO 10646/Unicode or new i18n handling
is needed, it will be handled by an extention as defined in EDNS.
I18n aware software should not load or use characters outside the
above range, unless they do not change the normalisation and case folding
mappings defined in Unicode 3.0.

This should allow IDN to be started to be used and it should be enough
for several years. In that time EDNS should be ready to be used.
When implementing Ii8n aware software, implementors should look at EDNS
and prepare for it.


>
>You have separated the operations of normalisation and case folding.  I do
>not understand why this is.  I agree that both must be applied before
>comparison, but why do you allow the server to return normalised but
>original case data?  Would un-normalised and original case data be more
>appropriate?  (especially in the case of Greek A vs. Latin A)
>

Un-normalised data cannot be used for comparison without normalised first.
Also normalised using form C or KC results in shorter length which is
important as only 63 octets can be used for a domain name part (without EDNS).
If everybody uses normalised data, the software can be simpler at many places.

We could discuss if form C or KC should be used. Form C does not destroy
any information but KC does. Martin can probably give some comments on
this.

To the question on why I separate normalisation and case folding:
The DNS specifies that original case should be preserved. The are several
reasons for this. Some of them are:
- "domain names" (actually the names in DNS) are used for many purposes.
- One is domain names where company names or tradmarks could be used.
  Very commonly companies and tradmarks are using a combination of
  upper and lower case to enhance the image of the name.
  Many of them would prefer that when you, for example, lookup the
  domain name for an IP address, the correct case is returned.
- An other is the e-mail address defined in the SOA record.
  While many systems now does a case-insensitive comparison on the
  user name part of the e-mail address, ther may still be those that don't.
  And also here, e-mail addresses can be makde more readable by mixing
  upper and lower case.
- If you look up a host name form an IP address you may want to use the
  host name to compare with other data. Many services under Unix does this,
  and many of the are not case-insensitive. So they need the correct
  case returned.
- There may be other uses of "domain names" that requires them to be
  unchanged.

That is why I think the original case need to be preserved and returnd
in responses (though this can only be done to i18n aware software).
It is also what the DNS specification says.
(actually I probably need to change my specification to say that
responses to non-i18n aware software should preserve case in the ASCII
range to avoid breaking programs that requires that).

(and you can forget Greek A versus Latin A as Martin pointed out that
normalisation form KC do not normalise Greek A to Latin A).


>
>You need to add a description of how DNS labels will sort (see section 8.2
>of RFC 2535) so that DNSSEC will work.  I suggest that you specify that
>names should be normalised and then lower-cased before sorting.
>
I looked at it, but missed that section. Thanks. I will something
about that (though all names should be already normalised).

   Dan