[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] stringprep: PRI #29



Simon Josefsson wrote:
There appears to me be a lot of decisions made out of subjective
opinions on how normalization "should" behave, or is "assumed" to
behave.

I don't think it's subjective. The concept of normalization requires that it be idempotent.


One way is to incorporate the PR-29 fix, declare the earlier attempt
as buggy, and re-cycle at PROPOSED.  I suspect you prefer that way?  I
am hesitant about that approach, because we have already deployed the
old RFC and it is not clear what problems there will be in mixing the
old and the new code.

We already have the situation where some implementations do it one way, and some do it the other way. It is quite clear what will happen when somebody uses a character sequence that is interpreted differently by these implementations. Keep in mind that Unicode may add new characters in the future that may also be affected.


Both Kerberos and SASL appears to be going to
use the old StringPrep as well, so we will be seeing security critical
infrastructure based on the old interpretation.

You write "the old interpretation" as if there is only one interpretation of the old spec. That's not true. As we have seen, there are implementations that do it one way, and those that do it the other way.


Another way is to carry on with the Unicode 3.2 NFKC even though it
breaks some human's assumptions on what "normalization" means in a
theoretic setting.

It's not just an "assumption", and it's not merely "theoretical". This is a very basic requirement for the normalization process.


Machines will cope, they compute an
algorithm, they don't care if the output meet some unstated invariant
or not.

IDNA specifies that a Punycode label must be decoded and then Nameprepped and Punycoded again to make sure you get the same string back in order to decide what to display (Unicode vs Punycode). This, in itself, should make you realize that the process is supposed to be idempotent. So we *do* care how the machines compute this algorithm.


A third way, which is what I am deploying, is to use the Unicode 3.2
NFKC together with a filter to reject the PR-29 problem sequences.
This is in line with the RFC's, it solves problems related to PR-29
problem sequences, and is simple to implement.

I don't think this is in line with the RFCs. You are rejecting sequences that are not rejected by the RFCs.


More importantly, when you continue to ship your implementation as is, more and more installations of your popular library will occur, making it more difficult for the world to adjust if and when the affected types of character sequences are introduced, either with the current characters or new characters.

You are in a position to make a difference. You already have. Please reconsider.

Erik