[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] Some new ideas in my updated draft





> -----Original Message-----
> From: Dan [mailto:Dan.Oscarsson@trab.se]
> Sent: Sunday, February 13, 2000 5:03 PM

Dan writes in his draft:
     - If a character can be represented in the local character set,
       map it from UCS to local character set.
     - If a character cannot be represented in the local character set,
       map the UTF-8 octet sequence for the character to a hyphen ("-")
       followed by the hex code of each octet as two charcters per octet.
     - If it was needed to down code because not all characters could be
       represented in the local character set, all original hyphens
       must be prelced by two hyphens ("--") and the entire strings
       MUST end with a single hyphen.



1) "The" local character set?  Who's local character set?  Such things
are these days usually personal preferences, or rather just personal
defaults swiftly overridden by "charset=", heuristics, or plain temporary
change.  Setting one non-UCS character encoding as "the" local one for
entire organisation or the like is highly inappropriate.

2) Though I'm slightly, but only very slightly, more sympathetic to
"say the catalogue number (in hex)" type of fallbacks (than CIDNUC-like),
it should really be the UCS "catalogue number" (like HTML/XML/modern
SGML does), and NOT be tied to any UTF or other encoding.

3) If any such fallback is to be used (occasionally), one need to interpret
the "catalogue number" fallback before lowercasing+normalising, and thus
do "read-catalogue-number-fallback+lowercase+KC-normalise, then lookup".

		Kind regards
		/kent k