[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

My thoughts so far



Hi

Getting on this list now results in having a lot to read from
the archive. Much of it looks good and I feel this group can
get good results.

Here are my thoughts and comments after having read through all
old e-mail.

-
I hope this can be the start of really getting internationalisation
into IETF and all its protocols. Not just domain names.

One very important thing is to internationalise the entire thing, not
just one aspect of it.
All of DNS must be internationalised, all records with text.
Including TXT and all others.
The same should be done for other protocols like HTTP, NNTP, SMTP and FTP.
For example all headers in HTTP or SMTP should allow UTF-8 and
MIME-coded headers with quoted-printable should be deprecated.
If it cannot be avoided that some programs break, let them. It is
high time everybody understands that they must leav the ascii only world.

In general IETF only defines protocols, but we must also think about
the UI aspects too. It is very important for the user interface to
communicated with the user using the users native character set.
For example: when editing zone files, the user must be able to use
the local character set. Or in a browser the URL must be entered
and displayed using all character available in the local character set,
instead of showing them %-encoded.

Back to DNS:

We should avoid putting restrictions on where international characters
may be used. For example: it must be possible to have non-ascii
top level domain.
When we add restrictions (like defining bad characters to use in
a domain name), make them as small as possible.

We need case insensitivity for more than the ascii subset!

But DNS must be able to represent both upper and lower case
in its database and in the protocol. This is because case is
important from some in presentation. An example is a lookup
translating IP-numer to domain name. The domain name should
have the original case entered into the database returned.

The best handling of characters I can see so far is:
- one character set must be used in the protocol.
  It should be ISO 10646 (UCS) (all 31 bits).

- canonicalised data using Unicode technical report #15
  with normalisation form C (or KC).
  - we need to have a discussion about form C or KC.

- case insensitivity using a single well defined mapping.
  Have a look at Unicode technical report #21 section 2.3
  The case folding data file would be a good start.
  - There are problems here with things like Turkish I.
    But I can see no way to help them unless a separate
    code point for I is defined.
    To use locale dependent case folding will give us big
    trouble. Think about this:
     A name server in Turkey is loaded with a name containing
     an upper case I. Case folding to lower case using a Turkish
     locale will result in a dotless i.
     Somebody in USA enters the turkish name and it is case folded
     using an English locale resulting in the dotted i. And the
     query is sent to the turkish name server - the names will
     not match.
     

-
In general for all protocols defined by IETF, they should
use the three character handling points I have given above for DNS.

It is also time for IETF to take a quick step forward into
internationalisation. Even though I have used non-ascii in my
URLs for many years, they are still not fixed by IETF standards.
Do I have to wait many more years util it happens?
Lets get DNS fixed quickly and get IETF to start moving quickly
into an internationalised world.

    Dan