[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] host name vs. domain name



Title: RE: [idn] host name vs. domain name

> ...  Rather, it is a user
> ability problem.  How many people would you think will be able to type
> Chinese ideographs they've never seen?  Even though they had
> a complete
> Unicode character map book, it is almost impossible to look through
> thousands of Chinese letters just to find one letter.

I'm well aware of that.  But these internationalised names would primarily be
intended for people that *can* read an write them.  For those who wish it, it
should be possible ((and not too expensive)) to set up aliases that are easier
to use for people who can read and write some other language/script. 

> One might learn to eventually learn to use a Chinese input
> method.  But,
> what about Korean characters, or Thai, or even Arabic characters?  The
> point is simple: it is uneconomical to learn foreign input
> methods just
> to enter some host names.

Yes, certainly.  I don't have any statistics, but out of all web site names (say)
how often do they get typed vs just obtained through a link in a web page?
Similarly for e-mail?  Similarly for, say, SNMP (of which I have no experience)?
If you HAVE to have an alias, that everyone can type, for a host/site/similar,
and assuming that everyone can type the letters A-Z, would it not be preferable
to use a name selected by SOMEONE, rather than a strange reencoding?

> At least, IMHO, some form of reencoding is necessary between the user
> and the client as a `universal opaque input method'.  It,
> however, isn't
> necessary that those reencoded hostnames be actually
> transferred on the
> wire, i.e. in the DNS client-server protocol (I also hate to
> see any of
> TES being actually used between client and server);

Hmm, I though the point with the TESes (like CIDNUC) would be that they
primarily was used between client and server, and were preferably not seen
by the "end user".

> the client must
> translate all reencoded names to their natural form (say, in UTF8)
> before transmitting over the network.
>
> I don't see why TES would be bad if it is to be used strictly between
> the user and the client.  They DO work as a input method, and
> that's all
> we want from using TES in this manner.  Again, IMHO as long as the
> client does not transmit undecoded TES on the wire it's fine.
>
> Thus, the scenario would be something like this:
>
>       For more information, please visit our web site at:
>       www.»ï¼º.com or www.-af2qv3g.com. (On a magazine ad)

You would probably be much better off with a transliteration than a "TES" for
such cases.  If no transliteration alias is provided (none registered), the IP
number "name" would be just as good as the second name here (though
maybe somewhat less permanent).

>       Now try visiting a must-see web site at: www.-af2qv3g.com
>       *chuckle* yes I know it sounds geeky, it's the website of
>       Samsung. (On the radio)
>
>       Dave, I think I found the site you've been looking for; check
>       www.-af2qv3g.com. (On ICQ)

Here you would get the name typed in by the sender and the recipient need only
click on the link (if the ICQ system does what most e-mail systems do these days).
The recipient needn't be able to retype it.  Cut&paste is an alternative that is
likely to work too, and does not require retyping.

> |
> | UTF-8 is *technically* on an equal footing with 8859-x, CP125x, and
> | CP9xx. "UTF-5", CIDNUC, QP, BASE64 are quite different, all of them
> | are *reencodings* (into ASCII); they are so-called TESes. 
> In addition
> | all of them are applicable only on restricted contexts, and
> determining
> | when to apply the decode and when not to is the main
> problem.  As such,
> | they are much *worse* than ISO 2022-based solutions (that, if used,
> | can at least be applied for the entire text).
>
> What if a host name standard specifies exactly where to apply those
> TESes, with ample examples (or test cases) that an implementor can use
> to verify his implementation?  There WILL be enough test
> cases once this
> standard has been established and widely used, BTW.

That, I'd say, was the case for QP and BASE64.  Still they pose problems
with decoding glitches even nearly 10 years later.  So I don't think lack of examples
is the root of the problem.

> |
> | You are also fogetting that UTF-8 will be very widely supported.
> | Whereas, even if accepted here, e.g. CIDNUC will have exceedingly
> | little support, and where supported it will be so only for some
> | very small portions of text, which ones will be *hard* to determine,
> | and the decoding will thus remain unreliable forever.
>
> Again, UTF-8 is NOT acceptible as an input method.


UTF-8 is not an input method by any counts.  It's a character encoding.  CIDNUC
is a "transfer encoding syntax" (of higly limited applicability), and should not be
considered an input method either. ISO 2022 is not an input method either...

                Kind regards
                /kent k

PS
I will not be answering any e-mails (at all, whatever they are about) for the next two
weeks.  I will not even be reading them...  (So I will not tire you for a while... ;-)