[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Unicode tagging



Andrea Vine wrote

> FWIW, I favor Normalization Form KC.  It makes the most sense to me to
> normalize and canonicalize (with whatever spec is decided upon) at the
> point of name entry, with possibilities for redundancy where folks think
> it is prudent.  This comes from my experience working with the myriad
> character sets but doing all internal processing in Unicode (in some
> CES).  The sooner we get the data into Unicode, the easier it is for the
> various modules to handle the data.

In an RFC822 parser, a mailbox can't be KC-canonicalized immediately
because "_" is a compatibility character which expands to an
underlined space (entirely logical, IMO). One you have an <atom>,
you can apply compatibility decomposition to that. Of course, no-one
sensible puts an underscore in a domain name or probably a local-part.