[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: Fwd: Unicode letter ballot



Kenneth Whistler <kenw@sybase.com> writes:

>> The only proper solution I can see is to stop modifying
>> published decomposition tables.  When mistakes are discovered, new
>> character codes with proper decompositions should be added and the old
>> character codes declared obsolete -- which is option B in the vote,
>
> This will lead to other interoperability problems. The 542
> supplementary characters in question (and all of the ones involving
> the errors) are CNS compatibility characters. They are there to
> provide round-trip mappings to the CNS 11643 standard. If you
> "obsolete" 5 code points and then add 5 new ones, then it is
> inevitable that CNS mapping tables will get updated to use the
> new code points instead of the old ones (and there will be some
> inconsistency in the mappings, because of the duplications, during
> this transition) -- because the old code points get normalized
> away to nonsense characters.

Assuming CNS 11643 discusses decomposition, you will have this problem
with either option A or B anyway.

Naïvely the conclusion then is that the entire 542 character block was
in error, and should be replaced by a new 542 character block that
provide the correct round-trip mapping with CNS 11643.

> This will undoubtedly lead to further problems, including for IDNA
> string matching, as one of the duplicated pair normalizes one way,
> and the other -- apparently identical -- normalizes another way. And
> you can't escape the problem by just adding the 5 obsolete code
> points to the stringprep prohibited list, because that, *too*, would
> have destabilized your specification: a string that was valid before
> you did that would be invalid after you did so.

It is not the stability I'm worried about here, but security issues.
I'm not convinced the approach you propose here have the same security
problems as having one Unicode string translate into different bit
strings dependening on which Unicode version stringprep uses.  In
fact, I believe your proposed solution guarantees security at the
expense of breaking backwards compatibility, which sounds like good
engineering to me.

>> but unfortunately neither IDN nor IETF has any voting powers (which
>> suggest a methodological problem).
>
> Why would IDN have voting powers here? You don't expect the
> UTC or SC2/WG2 to have voting powers in an IDN working group, do you?
>
> As for the IETF, the UTC and the IETF have a *liaison* relationship.
> The UTC immediately informed the IETF liaison about this ballot, because it
> knew this was an important issue that IETF participants are
> concerned about. That is why this discussion has migrated over
> to interested parties on the IDN list who have worked on IDNA
> and stringprep.
>
> But the buck has to stop somewhere. Ultimately the UTC and WG2
> are responsible for the CJK compatibility character mapping
> tables. So those committees have to take the relevant votes,
> and if they end up standardizing errors, also have to take
> the relevant knocks when they go to fix the errors.
>
> To influence the actual *voting* on this (or other issues),
> one works through the UTC voting member representatives in
> the UTC case, or one works through the national bodies
> participating in SC2 in the SC2/WG2 case.

My point was that if IDN wanted to influence the character set, and
decomposition tables in particular, it should not have referenced UTC
standards; it was a remark targetted against IDN, not UTC.  But that
is an entirely different discussion.  UTC and SC2/WG2 do have the same
voting powers as all other IDN WG participants have, btw.