[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] URL encoding in html page



A small point, and just because it hasn't been mentioned
yet (as far as I have seen):

I guess that the main reason that RFC 2277,... point to UTF-8
is that both the Internet protocols (v4 and v6) and most
hardware work with 8-bit bytes, and there is absolutely
no indication that this will change soon.

Regards,  Martin.

At 11:15 02/03/29 +0100, Keld J$BS(Bn Simonsen wrote:
>On Fri, Mar 29, 2002 at 12:40:41PM +0900, Bruce Thomson wrote:
> > > The question is really why 8/16/32 bit Unicode is better than 5bit (ACE)?
> >
> > ACE and UTF-8 are just compression algorithms that squeeze larger
> > Unicode characters. ACE is more efficient than UTF, although more
> > complex.
> >
> > But the claim to fame that UTF-8 has is that it is a standard that
> > idn can reference, and it is coming into widespread use elsewhere.
> >
> > So moving to UTF-8 long term seems like such an obvious choice
> > is surprises me that it even gets debated.
>
>I think the UTF-8 is the way to go forward too, and I actuallt think
>this is also IESG policy, viz RFC 2277 and RFC 2130. The UTF-8 RFC 2279 is
>the only standards track RFC on charsets for the same reason. Citing
>from RFC 2277, the IESG policy on charactyer sets and language:
>
>    "Protocols MUST be able to use the UTF-8 charset, which consists of
>    the ISO 10646 coded character set combined with the UTF-8 character
>    encoding scheme, as defined in [10646] Annex R (published in
>    Amendment 2), for all text."
>
>I dont mind having an ASCII fallback as we also have it in email, but
>going into the other encoding forms of ISO 10646 is discouraged by
>IESG, as they do not want to see a lot of encodings for the same
>character set, with the possible problems of interoperability.
>So UCS-2, UCS-4, UTF-16, UTF-16-LE, UTF-16-BE etc are discouraged.
>
>Best regards
>Keld Simonsen