[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: An idn protocol for consideration in making the requirements




>
>At 02:01 PM 2/1/00 +0100, Dan Oscarsson wrote:
>>I see it as very important that we do NOT allow the solution to
>>be encoded in ASCII! It is time everybody learns to deal with the
>>problems of having more then the ascii subset.
>
>Since we are discussing requirements, could you state your reason for this? 
>What is the strong advantage of having a non-ASCII *encoding* for 
>internationalized names on the wire as long as the rest of the requirements 
>are met?
>
>Personally, I believe that if we come up with a compatible encoding for the 
>full set of internationalized characters, the software industry will rush 
>to make input and display mechanisms for the encoding. The desire for 
>internationalization is just too strong for them to ignore it.

Just look at MIME in e-mail. There we got ASCII encoding of everything.
quoted-printable, base64. After so many years with MIME many tools
do still display the transport encoding and store e-mail in transport
encoding instead of in a user friendly way. I still quite often have to
read quotable-printable as software cannot handle internationalisation
or localisation.

I also do not want every protocol to define their own encoding of
non-ascii. E-mail using quoted-printable and any number of character
sets. IMAP using UTF-7. DNS using some UTF-5 or other complex encoding.
I want all protocols for transport to use the same encoding of
international characters. Then I can reuse my software to encode/decode
the transport format in all my programs.
As of today I can see only one good choice that can be accepted by
many people: UCS encoded using UTF-8. It is ascii compatible and fairely
compact. To then encode UTF-8 into ascii so that no value over 127
exists in a byte is a unnecessary waste of bandwidth and resources.
To choose some format that encodes everything in a-z just so that people
with software that can only display ascii is not a good reason. Even
they have to learn to handle at least 8-bit byte values.

I can easily make software the converts between my local character set
and UCS normalised using form C encoded in UTF-8. But not to make
software for all different encodings in all protocols.

>
>>ASCII compatibility and backward compatibility is good, if it does
>>not make things bad for everybody where ascii is not enough.
>>It is better to break software and get them fixed instead of
>>still trying to make everything to work in a world as it was
>>at the dawn of the computer age.
>
>I fundamentally disagree with this last sentence for two reasons:
>- You have not shown that you need to break software in order to fix the 
>problem of lack of internationalization
>- It is never a good idea to break the existing base of software, 
>particularly when you are talking about breaking a wide variety of 
>protocols across many levels of the Internet architecture. You should only 
>do this when there is no other viable alternative.

OK. Break is maybe to hard word. What I want is not for software to
break but the need to handle non-ascii be very apparent.

If we have host names encoded using UTF-8 and my software displays
the name badly or disallows me to enter a non-ascii name - then I
can easily complain to the software producer to fix it. If everything
is encoded as ascii, the much software will not complain when non-ascii
host names is used and it will take much longer before they get fixed.

Using UTF-8 everything will work as before while ascii only is used,
but problems may occur when non-ascii is used. Fine, then we can quickly
find what software needs fixing. If everything is encoded using
ascii only code values, I am sure we 10 years from now will have lots
of software handling only ascii where they should handle non-ascii.

For my needs I have better formats than UTF-8, but for the needs of
the entire world UTF-8 is the best choice I can see now. It has
ascii compatibiltity, fairely compact and can incoporate all characters
in the world.

Let us make UCS normalised using form C encoded in UTF-8 the
only recommended choice for interoperability!

Or do anyone have a better format?

   Dan