[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] host name vs. domain name



On Fri, 17 Mar 2000, Karlsson Kent - keka wrote:

| 
| 
| > -----Original Message-----
| > From: md@linux.it [mailto:md@linux.it]
| > Sent: Thursday, March 16, 2000 9:25 PM
| > Cc: idn@ops.ietf.org
| > Subject: Re: [idn] host name vs. domain name
| > 
| > 
| > On Mar 16, Karlsson Kent - keka <keka@im.se> wrote:
| > 
| >  >> If old software can't decode CIDNUC, it can't decode UTF-8 either.
| >  >Most software will be able to handle UTF-8 for any text.
| >  >Very little software will handle CIDNUC, and has to do
| > Please support your claims with a rationale.
| > 
| >  >It seems to me that you have not been so subjected to QP and
| >  >BASE64 during the last decade.  I have.  My collegues have.
| > I have. When using not MIME-aware software usually I could not display
| > raw 8 bit characters either.
| 
| They usually can, for some 'charset's at least.
| 
| Still, we still see undecoded QP every now and then.
| 
| >  >> Maybe a CIDNUC encoded domain is gibberish, but it's a kind 
| >  >> of gibberish
| >  >> I can easily type and display on a characters cell terminal.
| >  >Most people would consider it pure garbage, and never type it.
| > How do you think these lazy people would type undecoded UTF-8
| > characters (i.e., some 8 bit characters which may not be on their
| > keyboard)?
| 
| That does not make sense.  You seem to confuse character encoding
| with keyboard functionality.

This is not a matter of keyboard functionality.  Rather, it is a user
ability problem.  How many people would you think will be able to type
Chinese ideographs they've never seen?  Even though they had a complete
Unicode character map book, it is almost impossible to look through
thousands of Chinese letters just to find one letter.

One might learn to eventually learn to use a Chinese input method.  But,
what about Korean characters, or Thai, or even Arabic characters?  The
point is simple: it is uneconomical to learn foreign input methods just
to enter some host names.

At least, IMHO, some form of reencoding is necessary between the user
and the client as a `universal opaque input method'.  It, however, isn't
necessary that those reencoded hostnames be actually transferred on the
wire, i.e. in the DNS client-server protocol (I also hate to see any of
TES being actually used between client and server); the client must
translate all reencoded names to their natural form (say, in UTF8)
before transmitting over the network.

I don't see why TES would be bad if it is to be used strictly between
the user and the client.  They DO work as a input method, and that's all
we want from using TES in this manner.  Again, IMHO as long as the
client does not transmit undecoded TES on the wire it's fine.

Thus, the scenario would be something like this:

	For more information, please visit our web site at:
	www.»ï¼º.com or www.-af2qv3g.com. (On a magazine ad)

	Now try visiting a must-see web site at: www.-af2qv3g.com
	*chuckle* yes I know it sounds geeky, it's the website of
	Samsung. (On the radio)

	Dave, I think I found the site you've been looking for; check
	www.-af2qv3g.com. (On ICQ)

| 
| UTF-8 is *technically* on an equal footing with 8859-x, CP125x, and
| CP9xx. "UTF-5", CIDNUC, QP, BASE64 are quite different, all of them
| are *reencodings* (into ASCII); they are so-called TESes.  In addition
| all of them are applicable only on restricted contexts, and determining
| when to apply the decode and when not to is the main problem.  As such,
| they are much *worse* than ISO 2022-based solutions (that, if used,
| can at least be applied for the entire text).

What if a host name standard specifies exactly where to apply those
TESes, with ample examples (or test cases) that an implementor can use
to verify his implementation?  There WILL be enough test cases once this
standard has been established and widely used, BTW.

| 
| You are also fogetting that UTF-8 will be very widely supported.
| Whereas, even if accepted here, e.g. CIDNUC will have exceedingly
| little support, and where supported it will be so only for some
| very small portions of text, which ones will be *hard* to determine,
| and the decoding will thus remain unreliable forever.

Again, UTF-8 is NOT acceptible as an input method.

|
| 		Kind regards
| 		/kent k
| 

Regards,
Eugene Kim

-- 
Eugene M. Kim <ab@astralblue.com>

"Is your music unpopular?  Make it popular; make music
which people like, or make people who like your music."