[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] case folding




> From: RJ Atkinson <rja@inet.org>

> In Unicode, there are pre-composed characters and also composed
> characters.  If there is no
> pre-composed form for a letter, but there might be (hypothetically) 
> multiple ways of composing that letter, then there needs to
> be normalisation to a single form for a given letter prior to
> comparison for DNS purposes.

If I remember correctly, you have characters for "Combining Diacritical 
Marks". At that point, normalization is simple: if there is no 
specific character, you write it as the sum of components (in our
case, "o" + "^" + "`"), sort them in (unicode) lowest-to-highest code,
and you're done.

This won't work for ideograms, probably, but we already excluded 
them in this discussion.

(btw: are Vietnamese characters found in row 1E, Latin Extended 
Additional?)


ciao, .mau.