[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Chinese folding (Re: My prod at IDN requirements)




On Wed, 5 Jan 2000, James Seng wrote:

> Harald Tveit Alvestrand wrote:
> > I still don't get it - is this mapping done as part of the unification that
> > was done when deciding which characters to put in ISO 10646, or are these
> > defined as "equivalences" under some normalization form, but still have
> > separate codepoints in the BMP, or do you mean something different?
> > If the first one - are those characters among the ones proposed for
> > addition to Plane 2?
> >
> > Sorry to be so stupid - if you just name a couple of examples and what the
> > Unicode databases say about them, I may have a better chance of getting it
> > right.
> 
> We are getting into specifics here. 

But that is important:
 
> But just for interest sake, these 2145 characters are equivalences under
> normalization but have separate codepoints in BMP. The question is how do we
> treat these characters. If we decided to fold them into one, just like 'A'
> matched 'a', then we will have a problem using Unicode.

Well right now, that is what is possible with the current set in unicode
2.x. The next release of unicode might well have more. As the
normalization is not that systematic; you'd then need to update your
tables; and add what is needed for the 5000 or so added during the next
update, and the next, and the next. Thus effectively such mappings, and
detailed knowledge of it, become part of the protocol.

The devil certainly is in those details. So in a sense having any
mapping's done as just N->1 or as purely alternatives in the zone file
might be better. Yes it does mean that you have 'zillions' of 'cname'
alternative spellings in your zone file. But you can a) generate those to
you (local, coloured, political, cultural, ...) liking, b) adapt them by
looking at your log for miss-es and c) do never forget that all we do is
NAME->IP 'pure' exact spelling lookup in a hash table. I.e. that is the
only thing we should be trying to solve. Not real searching or anything.

Just my 0.02 euro.

Dw