[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Unicode tagging



Edmon wrote:
> 
> At 11:30 AM -0400 8/16/00, Edmon wrote:
> > >uniform byte-length characters are extremely beneficial to DNS and we
> should
> > >try to preserve it in the protocol.
> >
> > Could you elaborate on why this is true? In the applications that
> > mandate using UTF-8, no one has been unable to implement it.
> >
> 
> The reason as I have indicated is the need to have a character count.

Even UCS-2 is not a fixed length encoding.  As long as there are
combining characters, there is no such thing as a true character count
based on the bit length of a string.  If you're counting Unicode
codepoints, that's another story.  But what purpose would it serve to
count Unicode codepoints?

> 
> > >We dont know whether some day we might go beyond UCS-4.
> >
> > This is just plain silly. In fact, it is likely that we will only
> > need a tiny fraction of the space allowed by UCS-4, and there are
> > moves by ISO to make that clear.
> >
> 
> I dont think we want to be too sure.  We see today how difficult it is to go
> from ASCII to multilingual, it is our responsibility to create a system that
> will be useful and flexible for the future.

Here is a level of certainty:  it takes 2-3 years to approve a group of
characters to be included in ISO.  That group can number from 1 to a few
thousand.  Let's say every 2 years 3000 characters are added to ISO
10646.  Right now there are about 850,000 openings. So potentially, the
space available now, as UTF-16 or UTF-32 (not even UCS-4), will take
about 280 years to fill up.

By that time, I think DNS will have gone through lots of other changes
which we are not planning for either.

I for one am sure that we don't have to worry about anything beyond the
scope of ISO 10646/Unicode.

> 
> > >But we also have to understand that domain names are no longer simple
> text
> > >commands over the internet... it is part of a company's brand and
> > >identity... We have the responsibility to let people have the names they
> > >really want...
> >
> > Nope, stop right there. We have a responsibility to keep the Internet
> > running well. Once we have fulfilled that responsibility, we have
> > another one to stretch out in ways that meet desires and still keeps
> > the Internet running as well (or better) than it did before. If
> > someone wants to use a name that will hurt the running of the
> > Internet, we have a responsibility to tell them "no". If someone
> > wants us to stretch into regions we have not had time to analyze for
> > things like security implications, we have the responsibility to tell
> > them "no".
> 
> I totally agree with you.  Of course it must be a given that the technology
> can handle it... My major concern are arbitrary restrictions that create
> technically unnecessary constraints.  The worry that people might be
> confused about some similar characters should not be a concern for the WG.

I disagree.  I see compatibility characters as a security issue.  People
and companies registering their names haven't a clue about this.  To a
user, compatibility characters are absolutely identical.  This is a
phenomenon engineering created in the first place, and it is our
responsibility to keep it in check.  Users did not create compatibility
issues.  They won't care if they can't use them in a domain name.

FWIW, I favor Normalization Form KC.  It makes the most sense to me to
normalize and canonicalize (with whatever spec is decided upon) at the
point of name entry, with possibilities for redundancy where folks think
it is prudent.  This comes from my experience working with the myriad
character sets but doing all internal processing in Unicode (in some
CES).  The sooner we get the data into Unicode, the easier it is for the
various modules to handle the data.

My USD 0.02,
Andrea
-- 
Andrea Vine, avine@eng.sun.com, iPlanet i18n architect
"In these Regulations any reference to a regulation is a 
reference to a regulation of these Regulations"
-- Education (UK Student Loans) Regulations 1997