[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] What's wrong with skwan-utf8?



[JS: If you wish to carry on a long discussion on this, I would appreciate if
you could join the mailing list. This is the last time I will bounce your mail
back. Thanks]

Rick H Wesson writes:
> there is a lot of embedded systems out there that would crash-and-burn if
> they received a reply in utf8.

Can you please identify the systems, explain how they use domain names,
and say what exactly you mean by ``crash-and-burn''? We need this
information if we're going to accurately assess the cost of upgrading
the world to support IDNs.

Patrik Fältström writes:
> Many implementations of the above protocols happen to be able
> to handle UTF-8, while others can not.

Same question here.

I realize that sendmail removes characters 128-159 on input. Fixing this
is a trivial matter of encoding 128->255 160, 129->255 161, ...,
159->255 191, 255->255 255 in the collect() routine, then reversing the
encoding on output.

I also see a statement in the mailing-list archives that an obsolete
version of the Netscape mailer segfaults under Solaris when it reads
UTF-8 messages. Presumably this bug doesn't exist in Netscape 6.

> Also, there is a question whether UTF-8 is really what we should use.

Many systems use UTF-8 internally. It takes less work for them to read
and write UTF-8 than for them to handle text in other character sets.
Quite a few programs will Just Work(tm) if IDNs are defined as UTF-8,
while they'll have to be upgraded if IDNs are defined any other way.

It's easy to imagine a world where 8859-1, JIS, KOI-8, and so on have
all disappeared in favor of UTF-8. People are doing the UI work needed
to get there; see, e.g., http://www.cl.cam.ac.uk/~mgk25/unicode.html.
Don't you think it will be a bit embarrassing to look around in a UTF-8
world and see that the Internet is using UTF-7?

---Dan