[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] UTC feedback



> can we satisfy ourselves that we need only deal with
>
> a) the word spelled with all accents, and
> b) the word spelled without any accents
>
> for all languages, or do we also need to deal with every possible
> combination of a letter being accented or not?

> This is a problem.  If someone in an English-speaking company registers their
> cafe as "thecafe.com" and someone in France registers their Tea Caf頡s
> "th飡f鮣om", and a user types in THECAFE.COM, well...but I can't believe that
> a French user would do this on a regular basis.  Then again, I'm not a French
> user, and I don't know what the usual train of thought is when handling accented
> characters.
>

To the best of our knowledge, the vast majority of cases would be all or
nothing. There might be a few edge cases that require more than two, but
they should be a small percentage. Consider English "resume", "resumé"
(bogus, but used) and "résumé". (On the other hand, most people are
probably used to seeing and using URLs in lowercase, where the accents are
expected.)

There is always a problem in trying to make accents ignorable (folded).
Let's suppose that there were an accent that was only used in a single
language, and was ignorable in that language. First, this is *very*
difficult to determine whether this is the case, across all human languages.
Second, the other advantage of maintaining all accents, whatever the
language, is predictability. If everyone knows that accents always make a
difference, then they will get used to typing them in the URL if they see
them in a written form.

Mark