[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Punicode: Upper-case in example



Kent Karlsson <kentk@md.chalmers.se> wrote:

> In general, an example implementation (of any software specification)
> should be as pure as possible, no extra bells and whistles.

The sample implementation is divided into two parts, the implementation
itself (let's call it the core), and the test wrapper.  The core
is pure.  It implements the Punycode algorithm and the mixed-case
annotations described in appendix B, and provides a simple abstract
programming interface that makes it trivial to not use the annotation
support if you don't want it (you simply pass a null pointer instead of
a pointer to an array of flags).

> So the "1-bit annotations" appear to not belong.

I think it's good to have examples of the annotations, if for no other
reason than to serve as examples of mixed-case encoded strings, which
all implementations are required to handle, even if they don't interpret
them as annotations.

> If they belong, they surely should not manifest themselves as
> "U+"/"u+",

I don't see what's wrong with that representation.  The nice thing
about it is that it's very easy to ignore if you don't care about the
annotations.

> which in addition is not explained.

I can see how it might be considered unfortunate that the examples
section does not call attention to the annotations.  It might be too
late to change that, and I don't know if there would be consensus that
it should be changed.

> Furthermore, in this case, there are example argument (to encode) and
> result (from decoding) strings that are not readily presentable in an
> ASCII document (as IETF documents apparently still have to be).  So
> there is an additional step of presenting characters that are not
> ASCII printable ones. Using "U+" notation for such code points in
> the example string presentation is fine, though.  This additional
> step does NOT belong in a sample implementation, but should be kept
> strictly outside of it.

I agree that it does not belong in the core, and it's not.  I don't see
why it shouldn't be in the test wrapper.  The only purpose of the test
wrapper is to provide a convenient command-line interface on top of the
programming interface, to make it easy to write scripts to run tests.  I
put the conversion to/from U+ notation into the test wrapper because it
was getting to be a pain to generate and test the examples without it.
I figured that other readers of the document would find it convenient
(like I did) to be able to use the sample implementation directly on the
example strings.

If someone wants a different test wrapper that inputs and outputs
UTF-32, they can modify the test wrapper pretty easily.

> It would still be nice to have the example strings also in UTF-8 in
> the document, if possible, even if the example implementation does not
> work directly on UTF-8 strings.

It would, but I think RFCs are required to be ASCII.

AMC