[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Stringprep editorial comment



[Resending with different From: address.]

It would have saved me some confusion if it was stated early that
stringprep only takes Unicode is input to the processing.

Right now it says "text strings" up until it starts to discuss the
output of the process, where it begins to talk about Unicode.  One
could get the impression that stringprep is a framework for preparing
all kind of text strings into canonicalized Unicode strings, where it
is in fact only about preparing Unicode text strings.  Sentences like
the following support that view (from Introduction): "these profiles
will allow users to enter internationalized text strings in
applications and have the highest chance of getting the content of the
strings correct."  That sentence doesn't reflect what will happen in
typical internationalized systems (like, on _my_ machine) -- many
systems enter internationalized text strings in charsets other than
Unicode, and must convert it into Unicode before stringprep is useful.

My $.2 solution:

--- draft-hoffman-stringprep-03.txt.orig	Mon May 27 21:38:42 2002
+++ draft-hoffman-stringprep-03.txt	Mon May 27 22:08:02 2002
@@ -25,7 +25,7 @@
 
 Abstract
 
-This document describes a framework for preparing text strings in order
+This document describes a framework for preparing Unicode text strings in order
 to increase the likelihood that string input and string comparison work
 in ways that make sense for typical users throughout the world. The
 stringprep protocol is useful for protocol identifier values, company
@@ -92,7 +92,7 @@
 behaviors that make it difficult to compare text in a consistent
 fashion.
 
-This document specifies a framework of text processing rules. Other
+This document specifies a framework of text processing rules for text in Unicode format. Other
 protocols can create profiles of these rules; these profiles will
 allow users to enter internationalized text strings in applications and
 have the highest chance of getting the content of the strings correct.
@@ -100,6 +100,13 @@
 they think is the same string into two different input mechanisms, the
 strings should match on a character-by-character basis.
 
+This framework does not describe how data is translated from other
+characters into Unicode characters.  Systems that uses non-Unicode
+input methods must use a consistent way to transcode data into Unicode
+before using this framework.  In such systems, the transcoding
+algorithm is a critical part of enabling secure and "correct"
+operation of internationalized text strings.
+
 In addition to helping string matching, profiles of stringprep can also
 exclude characters that should not normally appear in text that is used
 in the protocol. The profile can prevent such characters by changing the
@@ -753,7 +760,10 @@
 Because it is impossible to map similar-looking characters without a
 great deal of context such as knowing the fonts used,
 stringprep does nothing to map similar-looking characters together nor
-to prohibit some characters because they look like others.
+to prohibit some characters because they look like others.  Nor does it
+do anything to assure that any algorithms translating characters
+from non-Unicode into Unicode produce the same output in all
+implementations.
 
 
 9. IANA Considerations