[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Re: stringprep: PRI #29
Simon Josefsson <jas@extundo.com> wrote:
> Having a draft outlining the alternatives would be a useful
> contribution.
For an implementation, I see the following alternatives:
1) Reject the problem sequences (and no others) before normalizing (add
a step 0 to the stringprep algorithm).
2) Reject some superset of the problem sequences that is easier to
compute and still rejects only implausible strings.
3) Reject nothing, but make sure the normalization algorithm follows the
proposed fixed normalization spec (and the original sample code).
4) Reject nothing, but make sure the normalization algorithm follows the
current broken normalization spec (not the original sample code).
5) Reject nothing, and follow whichever normalization spec is already
implemented (that is, ignore the problem).
I have listed these in order of decreasing quality (though the order of
2 and 3 is debatable), and in order of decreasing complexity (except
that 3 and 4 are equally complex).
I don't expect anyone to defend 5.
Given that there already exist implementations of both kinds of
normalization, and neither is more right or more wrong than the other
(in light of the internally inconsistent spec), it would be silly for
stringprep to standardize on the broken one, so we can discard 4.
I can imagine an implementor making a case for any of 1-3. The next
question is how much latitude the stringprep spec should give the
implementor.
We might consider allowing implementations to choose from among more
than one approved alternative. I lean against that idea, because
stringprep is used to define valid strings in protocols, and it's hard
to analyze what can happen when implementations disagree about the
validity of a string. (There is already one way this can happen in
stringprep: unassigned code points become assigned. The analysis was
very tricky, and I'd prefer to keep that sort of thing to a minimum.)
If stringprep were to pick alternative 3 as the one and only allowed
behavior, some implementations would consider that an unacceptable
security risk and implement 1 or 2 anyway, so in practice we'd still
have implementations choosing from multiple alternatives.
This argues for the stringprep spec to require the rejection of either
(1) the problem sequences or (2) some particular superset of the problem
sequences that is easier to compute. But the implementation simplicity
of (3) is very appealing, and part of me hopes to be persuaded to go
with that in spite of the analytical complexity.
If we choose 1 or 2, we still need to decide how to define the sequences
that must be rejected. For example, if we choose 1, I suggest something
like this:
Definition: A string must be rejected if and only if oldNormalize
and newNormalize return different results.
Theorem: The rejected strings are exactly the ones containing
code points with such-and-such properties in such-and-such
combinations...
Theorem: In Unicode version x.y, the previous theorem amounts to
the following table of specific code point combinations...
Notice that if an implementation has access to both oldNormalize and
newNormalize, it can implement the definition directly without the
complexity of applying either theorem. It could also do a simple check
for the presence of any code points from the table, and optimize away
the second normalization if none are present.
AMC