[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Editorial comments on stringprep



draft-hoffman-stringprep-02.txt
-------------------------------

I've reviewed the stringprep document to make sure it is
ready for IETF last call.

Here are some editorial comments that it would be useful to
fix before the last call.

The nameprep document contained some text which I think should belong
in stringprep. The text in nameprep (in two different contexts)
was:
	The lists in Appendix E MUST be used by
	implementations of this specification. If there are any discrepancies
	between the lists in Appendix E and subsections below, the lists in
	Appendix E always takes precedence.

	This profile lists the unassigned code points in the
	range 0 to 10FFFF for Unicode 3.1 in
	Appendix F. The list in Appendix F MUST be used by implementations of
	this specification. If there are any discrepancies between the list in
	Appendix F and the Unicode 3.1 specification, the list in Appendix F
	always takes precedence.

The document needs to make it more clear that it will be an exception
when profiles define additional tables. The intent is that stringprep
define the tables and that  the stringprep profiles will select among 
those defined tables. Adding some text to section 1.2
about this would be helpful for folks defining profiles. E.g.
add "(in exceptional cases)" for those items in the bulleted list.

It isn't obvious that table B.2 and B.3 derive directly
from the Uncode tables when in fact they do. 
I think it would be useful to state this.
In fact the text about "NormalizeWithKC(Fold...)" in section 3.2 describes
how table B.2 is created, but it needs to explicitly say this.
Likewise it would be helpful to say that B.3 is created from the Unicode
table for case folding (CaseFolding.txt??)

The document needs a table of content based on the rfc-editors
criteria and the length of the document.

Section 1: Is there a missing "would" before "cause an error"?

The RFC 2119 citing text does not include all the 2119 words.
The document uses "SHOULD NOT" and "MUST NOT" which are not included.
The safest thing is to use the boiler plate that lists all the 2119 words.

Section 1.2:
	- The tables
	froom this document of characters that are prohibited as
s/froom/from/

Section 3: talks about "mapped to nothing" but the tables refer to
this as "map out". Either use a single term or explicitly state e.g.
in section 3 that they mean the same. Latter could perhaps be done
by adding "map out" in parenthesis after "map to nothing".

3.1 says "because their presence or absence should not
make two strings different".
The statement makes sense in the context of protocol elements
(for the protocols whose stringprep profile select this mapping)
but it might not make sense for text in general.
Thus I think it makes sense adding this context to the description.

Section 3.2 talks about the tradition of using lowercase
but my understanding is also that there is a technical argument 
in that downcasing is more well-defined in Unicode than upcasing.
Is this not the case? If so if would make sense to add that to
the description of the motivation.

Section 3.2 missing close parenthesis after "statuses C, F, and I."

Section 3.2 talks about "the table is stable from that point on."
But it doesn't actually state that this is how the table in B.2 is generated.
Perhaps this should be made explicit both here and at the top of
appendix B.2.

Section 3.2 and elsewhere: I don't think "unupdated" is a word.
How about 
	in both those systems which have been updated and those which have not.

Section 4 talks of "three options for Unicode normalization"
but there are only two bullets below that text.

Section 4 doesn't refer to which Uncode table to use (e.g. the filename)
In fact, partially because of this I originally read the document as 
if table B.2 including the NFKC mapping when it in fact does not.
This needs to be made more clear.

Section 5.2. Says
	displayed. Note that additional 
	control characters (U+0000 through U+001F, and U+007F) are not listed
	below. They are listed in Appendix C-2.
The second sentence makes one believe the ASCII control characters are
in appendix C.2 when in fact they are not.

The references to the parts of appendix C use "C-<n>" but the
appendices are named "C.<n>".

Section 5.3. The argument for the replacement character is quite weak -
it does have some semantics.
Isn't there a stronger argument that one would never expect to use
this character as an *input*? The intent is to only use it for
*output*. As such it doesn't make sense to have it be used
in a protocol element.

Section 5.4 missing "in" before "the Proplist.txt file ..."

Section 6 talks of "names of SNMP objects" in multiple places.
But SNMP objects are named by OIDs - not character strings. Drop it?

Section 6 says
	Using two different policies for where unassigned code points can
	appear prevents the need for versioning in protocols that use
I think "removes" is better than "prevents" here.

Section 6 seems to use the term "query" in some places
and "request" in others. Please make it use a consistent term.

Section 6.2 seems to use the term "correct" with two rather different
meanings:
1. The correct answer is returned (positive or negative answer)
2. In order to prevent false positives the "correct" approach is
   to return false negatives in some cases.

I think #2 should use a different term. ("predictable"? conservatively? I'm
open to suggestions).
This appears in two places: the XY vs. YX and the X vs. nX.

Section 7 reads like the designers were just too lazy 
to handle similar looking characters, which clearly isn't the case. 
Thus I think it makes sense to add some language saying that
similar-looking is a very hard problem and is (believed to be) impossible
to solve without some additional context, and that context is
not available in DNS lookups.

Appendix E.
I think it makes sense to move this to a section i.e. before the character
tables.

"MUST be an RFC" is not one of the choices in RFC 2434.
"IETF Consensus" seems closest to what the text says, but "IESG approval"
might also match. And add a citation to RFC 2434.


Split normative vs. non-normative references.

---