[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Editorial comments on stringprep



--On 2002-05-04 10.26 -0700 Doug Ewell <dewell@adelphia.net> wrote:

> I couldn't remember UTC ever saying such a thing, so when Mark Davis
> <mark@macchiato.com> wrote:

I said _at_least_before_version_2_.

In 1995 I had a long discussion with Unicode Consortium when I worked at
Bunyip Information System how to do normalization and case folding
together. Their response was to covert to lower case, and our
implementation of Whois++ named Digger also ended up in a paper which was
presented at a Unicode Conference around 1995-1996.

I also saw Mark only refering to case folding, not lower case perticularly,
and that's why I talk about handwaving, historical artifacts etc etc.

>> There are also a number of codepoints which are lowercase which
>> doesn't have uppercase versions.
> 
> Which ones?  I can think of a character that looks uppercase but has no
> lowercase form (U+04C0 CYRILLIC LETTER PALOCHKA).  But such letters,
> despite their appearance, are neither uppercase nor lowercase; they are
> caseless, and immune to the effects of any casing operation.

Quote from page 142 in Unicode version 3 book:

"Also, because many characters are really caseless (most of the IPA block,
for example), uppercasing a string does not mean that it will no longer
contain any lowercase letters."

I only quote the text.

Yes, when reading it, one might think it should have been written "...that
it will only contain uppercase letter." but it doesn't.

>> Last, some codepoints (like the german sharp-s, ß) turns to "SS" in
>> uppercase, and my guess is (with my limited knowledge of German,
>> only 2 years of studies) that one when comparing don't want that
>> similarities.
> 
> German speakers are forced to deal with that mapping every day.  It is a
> natural part of the language.

I know. I just gave it as one example where I _thought_ people from Germany
rather wanted lower case than upper case. I wait until I hear from someone
from Germany saying they prefer mapping to upper case before I say
something else.

We talk about what is preferred. Not wether people are used to.

>> And, personally, I rather see bq-asdqwe123 than BQ-ASDQWE the few
>> times I hope I see a domain name used in protocols natively in its
>> ACE encoding.
> 
> No argument there.  All-lowercase is widely recognized as being easier
> to read than all-uppercase, primarily because of the greater variation
> in letterforms.  But again, there doesn't seem to be any evidence that
> the Unicode Consortium has made any of the claimed statements about
> "preferring" lowercase or about the mapping to lowercase being more
> "consistent."  Please dig up the relevant references, if possible.

After some digging, I see the paper Philippe and I wrote was presented on
A5 on Unicode Conference number 9, on september 5, 1996:

   Text Searching Across Multiple Character Sets in Unicode
   Philippe Boucher, Senior Programmer,
   Bunyip Information Systems, Montreal, Quebec, Canada

I can not find the document though.

   paf