[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] universal typability



All,

Karlsson Kent - keka wrote:

>      Karlsson Kent - keka wrote:
>
>     > Let me note again that CIDNUC and such are unacceptable,
>     > since
>     > they are reencodings into ASCII that turn (for some
>     > people)
>     > understandable names into complete gibberish, and given
>     > the QP
>     > (and BASE64 for text) experience I have no optimism of
>     > having
>
>      For people who don't know a language's script, CIDNUC (UTF5,
>      or %HH encoded UTF8) produces mild gibberish out of
>      (regardless of how beautiful the script really is) perceived
>      complete gibberish.
>
> That is a nonsense argument.  CIDNUC and other *Transfer Encoding
> Syntaxes* (most often reencodings into ASCII) produce *complete
> gibberish* for *everyone*.

No.  The string ufncbiuf8f9h4 is gibberish, but I recognise each letter
(and everyone recognises a-z0-9-).  If I replace each letter in
ufncbiuf8f9h4 with assorted ideograms, then for many it becomes total
gibberish.

> UTF-8 (which is 1. not a transfer encoding syntax, and 2. will be
> supported essentially universally) will produce text that is
> completely comprehensible to *the target audience* for a particular
> IDN.  If *you* can read/type it is completely beside the point.

There will be a balance between localisation and internationalisation.
You focus more on the former, I the latter.

>
>
>      Mild gibberish can be internationally recognised and typed
>      -- i.e. anyone can get to the website's front page (and from
>      there Accept-Language takes over).  This works even from
>      thin clients such as mobile phones.
>
> E.g. Chinese CAN be typed on a mobile phone (even with just a
> 'numeric' keypad).  See demo at
> http://www.nokia.com/phones/tutorials/7110_tutorial/cinput/index.html.
> Apperently several manufacturers licence the same input software (see
> http://www.tegic.com/).  I'm sure mobile phones targeted for a
> particular market will have input methods for their scripts too,
> whether that is Thai, Devanagari, Hebrew, or whatever.  In addition
> it's not all that uncommon to have "full" keyboards builtin to larger
> models of mobile phones or even to have an attachable keyboard (see
> e.g. http://www.ericsson.com/chatboard/europe/ [somewhat over-hipped
> site...; the 'chatboard' is an actual product though, available now]).

Again I was thinking globally.  In a few years a 100 million mobile
phones in Europe (most without full keyboards) will be able to access
Asian sites, but won't have Asian fonts and input methods installed.
(Hell, current mobiles here don't even have decent European fonts and
input methods yet.)

Also, smart-input software tends to work less well for URI entry.

>      But complete gibberish can't ever (because of human
>      limitations) be understood and typed accurately (implies a
>      security risk BTW).
>
> I'm not sure what you are trying to say here.

Instead of entering ideogram.jp, nonAsians may go for
verysimilar-ideogram.jp which may be run by hostiles.  Like eoke.com
spoofing traffic from coke.com.

>
>
>      Rich sites may purchase two host names - their true I18N one
>      and an ASCII transliterated one.  Poorer sites would
>      probably quote (on business cards, etc) their true I18N
>      domain and for those who can't type/etc that they may well
>      quote their IPv6/4 address.  :-(
>
> This is a pricing issue, which is out of scope for this WG.

But this is a critical issue.  If ASCII transliterations (decided by a
human) go free with registering i18n hostnames (and this is the best
solution IMHO unless DNS load would be significantly harmed), then I'd
say "double hyphenated UTF-8" (as a transition to true 8-bit UTF-8)
would be an excellent and immediate first step solution.  (%HH escaped
UTF-8 would be allowed but would be translated immediately to double
hyphenated UTF-8 or to true UTF-8).

The length inefficiency of "double hyphenated UTF-8" (compared to CIDNUC
for example) would encourage the move to true UTF-8 (I hope!).

Example:
User types in www.gås.net or www.g%c3%a5s.net (or even
www.gc--3a--5s.net)
and browser displays www.gås.net
and then translated by the browser this becomes (while world still not
'8-bit safe') www.gC--3A--5s.net
or becomes (a few years from now (*)) www.gås.net

The (free) transliteration could also be entered by the surfer:
www.gaas.net (or whatever www.gås.net owner decides).

(*) But WAP air interface to use this immediately.  Translated to
www.gC--3A--5s.net at WAP gateway before entry to internet.



(BTW, are there any stats on how often hostnames change their IP
address?  If low, then IP addresses could be used instead of cost-free
transliterations???)


>
>
>      Will IDN allow I18N hostnames to have ASCII transliterated
>      equivalents (with no site owner extra cost)(worry: increased
>      DNS load)?  Should this universal typability be discussed
>      under requirements?
>
> A transition strategy will most likely require that there is an ASCII
> alias (among perhaps several other aliases), DECIDED BY A HUMAN. The
> last point is very important. No algorithm is going to be producing
> anything sensible *for the target audience*.
>
> Sorry for repeating ad nauseam, but CIDNUC and other TESes (like the
> misnamed UTF-5) are totally unacceptable.  Can we please concentrate
> on finding an actual solution?
>
>         Kind regards
>         /kent k
>
Regards
Aaron
--

-----------------------------------------------------
Aaron Irvine
  mailto:airvine@corp.phone.com
-----------------------------------------------------