[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Do you want ASCII forever?



I do not know if the IDN mailing list is going to shutdown
or if the IDN working group is going to move forward with new things?

Before that I would like to know if most of you want to
have ASCII forever, or if you want to move to UCS as the 
character encoding to be used for interoperability?

I have since the beginning of the IDN working group wanted UCS
to the only character encoding to be used between systems.
But from the discussions on the list I have felt there is a big
group not wanting to leave ASCII.

IDNA will result in increased complexity. I have written software
handing the decoding of MIME and URLs. I very much dislike the
mess where every small part of a text line have to first be parsed
into parts, the each part have to be decoded using different
methods. ACE, %-encoding, quoted-printable,...
The world would have been so much simpler if everybody
had used UCS. Why not at least make that the goal and try
to get rid of the "encode on top of ASCII"?

Looking at some of the more important areas: DNS, SMTP, HTTP
and HTML they could be fixed fairely easy.

- SMTP could add a negotiate in startup switching default
character mode to UCS NFC in UTF-8 (requiring
all headers to be UTF-8 and all default for all text).

- HTTP version 1.2 could require all URLs and headers to
be UCS NFC in UTF-8.

- HTML 4.02 could require all URLs to use the same character
set as the rest of the document or use %-encoded UTF-8.

- DNS can use something built on UDNS, though due to having
IDNA it will need an overhead as it will have to query
the server if it can handle UCS (would not have been needed
if we had started with UDNS).


But it requires people willing to take a big step forward. 
I would very much like to help in defining the standards 
but so far I have felt there is too much "use ASCII forever".
IDNA will not bring non-ASCII to my host names in the near
future. It probably will bring it to my web browser, but it
would have worked fine with UTF-8. And I already see all
the failings where URLs end up being displayed %-encoded
instead of using native characters.

What does IETF want?
I think it is high time for a major step forward in
character set interoperability.

Regards,

   Dan Oscarsson