[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Editorial comments on idna



draft-ietf-idn-idna-07.txt
----------------------

I've reviewed the idna document to make sure it is
ready for IETF last call.

Here are some editorial comments that it would be useful to
fix before the last call.

I've read the document thinking about what an implementor of an
application needs to understand and know in order to enable IDN in the app.

The introduction makes sense for those that have participated on the IDN
mailing list. Unfortunately it doesn't do a good job of introducing 
things for the first time reader. Assume that Joe/Jane application 
implementor is going to read this. What would they need to know up front 
to help them understand what they need to do where in the applications?
I think this can be done with an introduction that is 1-2 pages long.
But it needs to start stating the problem a bit.
And the dangling reference to "other proposals" don't make any sense
for such a reader.
Perhaps even the picture in section 6 can be moved to the introduction.
Even better to split that picture into two pictures - one "before" 
and one "after" adding IDN support to the application.

Section 4 and the ToASCII and ToUnicode algorithms in general
operate on labels but the application cares about domain names.
I don't think it is sufficient to tell the poor application programmer
"do this for each label" without saying how to extract the labels
from the domain name.
I assume this algorithm just consists of looking for U+002E in the input
strings. 
However, in order to be consistent with the text in section 5.1 in nameprep
I think IDNA should say that the labels should be separated at U+3002
as well as U+002E.

Section 6 doesn't mention that the application needs to set the
"host name syntax rule" flag.
Section 6 doesn't mention that the application needs to set the
"prohibit unassigned" flag.
What is needed here is just some text to make it clear that the
application needs to do *something*. Example text would be:
	When the application is handling a domain name for which
	the host name syntax checks should apply (which is the common case)
	the application needs to set the "host name syntax" flag when using
	ToASCII/ToUnicode.

	When the domain name is stored the application needs to set
	the "prohibit unassigned" flag when using ToASCII/ToUnicode.
	[and similarly for the inverse]

The abstract could be improved by adding the benefit of using a
representation into the same octets e.g. by adding
	This representation allows IDNs to be introduced with minimal
	changes to the existing DNS infrastructure.

Section 2: Is there a reference for US-ASCII that can be added?

Section 2: host name vs. domain name.
I think it makes sense to add "In this document we use the term "domain
name" in general, and when referring explicitly to the syntax restrictions
in [STD3] we use the term "host name syntax".

Section 2" IDN definition.
I don't think the last sentence about "name handling bodies" 
is part of the definition.
Is there a better home for that statement somewhere else in the document?

Section 2: Internationalized Label
I think the definition for "ACE label" should be broken out as being its own
paragraph.
Also, the ACE definition should be explicit that the ACE label includes
the ACE prefix (which is what punycode states).

Section 2: I find it odd that the "generic domain name slot" is less
generic than the "internationalized domain name slot".
Is there a better name which is closer to the reality that such
slots are quite restrictive? (LDH slot? ASCII slot?)

Section 4.1 should say in its first sentence that the
input is a sequence of code points making up one label.

Section 4.1 and the definition of "internationalized label" talk
of "equivalence". I think this is a bit too subtle.
I think it might help to define "label equivalence" in section 2
explicitly (in its own paragraph) and point out that this
is not just a case of code point by code point comparison
(because under "label equivalence" an ACE label should match
the corresponding non-ACE label.)

Section 4.1 talk of "host name syntax rules" but has no reference
to what this means. STD3?
Perhaps the above clarfication to the definition takes care of this.

Section 4.1 (or section 6?) should say something about the meaning
of a failure indication of ToASCII.

Section 4.1 step 2: specify that the "prohibit unassigned" flag is
used by nameprep.

Section 4.1 step 3: "is a host name" isn't well defined.
It should say something like
	if the "host name syntax" flag is set.

Section 4.1. Step 5 should presumably be before step 4.
Or do you want to allow multiple applications of ToASCII i.e. being
able to apply it to a label which has already been "ACEd"?
If so it makes sense the desire to be able to do this explicit.

Section 4.1 step 6 doesn't say "and fail if there is an error".

Section 4.2 should say in its first sentence that the
input is a sequence of code points making up one label.

Section 4.2 step 2. Why is this step needed?
This is likely to be another unstated desire/requirement.

Section 4.2 step 2. Should say that nameprep uses the "prohibit unassigned"
flag.

Section 4.2 step 6. What happens when ToASCII returns an error?

Section 4.2 step 7. Instead of "sequence" how about "result of
ToASCII"? "Sequence" reads a bit like "the saved copy 
of the sequence".

Section 5 After "IESG--" add: [IANA to assign]

The picture in section 6 doesn't actually show where ToASCII and
ToUnicode fits. How about putting a box between the resolver library and the
application conversion box?

Also, the text on the arrow going to "application servers" talks both
of the immediate case (when application protocols use ACE) and
a future where some application protocols might negotiate to use
a non-ACE encoding. Perhaps the latter should be a separate picture?
The text could be replaced by having two different paths. Roughly

	v	^			^
	|	|			|
	v	^			|
    ToASCII/ToUnicode			TBD
	    ^				|
	    | Existing application	| New, IDN-aware application
	    | protocol			| protocol
	    v				v
    Existing application	New, IDN-aware application servers
    servers

I think the text in the picture is trying to say the above.
The above "TBD" is "predefined by the protocol" in the document.
Perhaps the "ToASCII/ToUnicode" box should be named "IDNA" - but "IDNA"
is more the whole approach than the single functional unit doing
the conversions.

The last paragraph in section 6.2 can perhaps refer to the 
terminology in the picture. For instance saying that in terms of figure X
this means that the ToASCII/ToUnicode box would be available as part of
the Resolver library box.

Section 6.6 says
	It also SHOULD
	be used for other name comparisons, such as when a browser wants to
	indicate that a URL has been previously visited.
I don't understand what this has to do with DNSSEC. Does it belong in
a different section?
And if we allow well-defined comparisons without requiring ACE encoding
before the comparison then I don't see why it would be required.

Section 7 says:
	ACE is an encoding for domain name labels that use non-ASCII
	characters.
My brain tripped on this. Seemed to read that ACE uses non-ASCII characters.
And I suspect other non-native English speakers might have more problems
understanding what "that" refers to. Can it be reworded?

Split normative vs. non-normative references.

---