[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] URL encoding in html page



Compliant browsers already have to handle Unicode, since NCRs (e.g.
ሴ ) are always Unicode code points. All XML parsers also have
to handle Unicode (UTF-8 and UTF-16).

> Legacy encodings
> will dominates even in the future, because it is compact and
> inexpensive.

While I do expect the transition to Unicode to take some time, once
some of the older browsers die off it may shift more rapidly than we
think.

Mark
—————

Γνῶθι σαυτόν — Θαλῆς
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

----- Original Message -----
From: "Soobok Lee" <lsb@postel.co.kr>
To: "IETF idn working group" <idn@ops.ietf.org>
Sent: Friday, March 22, 2002 02:04
Subject: Re: [idn] URL encoding in html page


>
> ----- Original Message -----
> From: "Bruce Thomson" <bthomson@fm-net.ne.jp>
> To: "Soobok Lee" <lsb@postel.co.kr>; "IETF idn working group"
<idn@ops.ietf.org>
> Sent: Friday, March 22, 2002 6:29 PM
> Subject: Re: [idn] URL encoding in html page
>
>
> > > What if all the html viewable text is in english, but, only the
href url contains
> > > legacy (korean) encoded hostnames?  chinese visitors would see
clean english homepage,
> > > but fail to click through the korean link.
> > >
> > Well, that could happen, but a META tag would solve that so
easily. Personally
> > I often use a simple text editor to deal with HTML, and would find
it easier to
> > use legacy encodings or UTF-8 than cut-and-paste ACE from
somewhere.
> > Of course the user could do it either way and it would work.
>
> Yes. Charset META tags help. But, many homepages  have assumptions
on the main audience's
> default char encodings and very often omit the  META tag for the
encoding like :
>   <meta http-equiv="Content-Type" content="text/html;
charset=euc-kr">
>
> Moreover, IDN url would be used in a pure FRAMESET document that
defines frame URLs
> and contains no viewable texts. Such FRAMESET documents often omit
charset META tags.
>  (look into the html source of http://www.freeway.co.kr/ )
>
> AFIAK, 99.99999% of korean homepages have implicit/explicit
> legacy korean encoding (KS_C_5601-1987 or euc-kr). So do most
japanese/chineses homepages.
> UTF8/UCS-2 encodings are rarely used in global WEB publishing.
Legacy encodings
> will dominates even in the future, because it is compact and
inexpensive.
>
> IF we want to make IDN truly internationally interoperable, all
IDN-aware webbrowsers/applications
> should contain libaries of all kinds of legacy-to-Unicode conversion
routines. It will burden
> too much memory load on handheld devices like PDA.
>
> Moreover, legacy encodings are revised separately from unicode. We
may face with as toughest
> versioning problems as we did in stringprep/nameprep versioning
problems for newly added unicode points.
> How to guarantee  stability and intergrity of IDN operations in the
all combinations of  numerous kinds and versions of iDN-aware
> applications and legacy encodings?
>
> Soobok Lee
>
>
>