[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: China



Not quite sure where the subject line came from... but here's my ?0.02

>  - I think, first, that the selection of character set is a no brainer. 
>    There are defined character sets, and we know how to put them into DNS.

>    The thing to do is follow the same procedures and rules we have used in

>    extending other protocols that use alphabetic information - some 
>    combination of ISO 10646 enhanced by UNICODE rules. I would encourage 
>    you to settle that debate quickly and move on. This is the easy part.

If choosing ISO10646/UNICODE as the single charset helps resolve
canonicalisation matters (and I think it does) then we should do so now.
Choosing the encoding can be left for later.

>  - Although DNS is defined as a binary service (and therefore amenable to 
>    changes such as the use of UTF-8), many implementations are dependent 
>    on the specific character set used by UTF-5. Therefore, deployment of a

>    UTF-8-based solution implies a need for extensive testing of 
>    implementations to make sure that they accomplish the necessary goals.

Does this imply a requirement that the protocol shouldn't send non-ASCII DNS
labels to servers which aren't expecting them?  To resolvers which aren't
expecting them?

I suspect that servers will tend to be more robust (and easier to upgrade
since they are normally attended by sufficient administrators).  There are
also not that many DNS server implementations, and most of them have been
written by knowledgeable people so I suggest that "Should not send non-ASCII
names to servers which don't support IDNs" is a non requirement (and should
be listed as such).

However, who knows how many resolvers are out there built into printers and
the like.  I expect that some of these will be quite fragile.  So I would
suggest adding "Should not send non-ASCII names to resolvers which don't
support IDNs" to the requirements.

>  - There are significant questions in the comparison of characters. For 
>    example, in European alphabets, upper and lower case are considered 
>    equivalent - "Cisco.com" and "cisco.com" are the same DNS name. In 
>    German, a "u" with an umlaut over it is equivalent to a "u" followed by

>    an umlaut extender, and also to the character string "ue". No doubt it 
>    only gets more interesting as you move to ideographic alphabets.

I think that the "nasty ASCII alternatives" issue is also a non-requirement
since it is so language dependent - do all languages which use u-umlaut
pronounce it as "ue"?  What about n-tilde? (in Spanish this might be written
"ny" in an ASCII-only charset).  Can anyone who speaks several European
languages come up with an example of an accent which has different "nasty
ASCII only" spellings in different languages?  Is this also a problem in
Asian languages (sorry - don't know the right term)?
 
Regards,

    Andy