[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Punicode: Upper-case in example



Hello Adam,

I understand that the documents have been approved by the IESG,
so at this stage, changes are not appropriate. But maybe some
of the changes can be done in the next stage (draft standard,...).

At 23:23 02/11/27 +0000, Adam M. Costello wrote:
Martin Duerst <duerst@w3.org> wrote:

> I don't think this is a very good idea to use the U+ for
> distinction, for the following reasons:
>
> 1) The u+ -> lower case, U+ -> upper case is not documented anywhere
>    in the punycode draft (or at least I didn't find it). If used at
>    all, it should be documented straight at the start of the examples.

It is not documented in the spec because it is not a feature of
Punycode.  The Punycode algorithm inputs and outputs code points, which
are numbers.  It does not input or output "u+".
I agree that documenting it in the normative part of the spec
would be a bad idea. But what I was proposing was that it be
mentioned in the examples section.


The sample implementation inputs and outputs "u+".  Therefore the use
of the u as a 1-bit annotation is mentioned in the documentation of the
sample implementation, which is embedded in the source code (you can
either read the source code of the usage() function, or run the program
with no arguments).
I agree with what Kent Karlson said on this.


I tried to downplay mixed-case annotation as much as possible in the
draft, because Paul and Patrik have never liked it.  I'll ask them if
they think the Examples section should call attention to the mixed-case
annotations.
I agree with downplaying. But then the best thing would be to not
use it in the examples section. The alternative is to mention them
in the examples section.


> 2) The above convention is very easy to overlook, in particular
>    because u+ and U+ look so very similar.  It is close to a widely
>    established convention, but differs slightly.

I'm curious--can you explain or point me to that convention?
The convention is the very widely established convention of writing
Unicode/UTC codepoints with an U+ followed by 4 to 6 digits. In that
convention, the U is always upper case. For many people used to this
convention, this convention is so well established that it's very
easy to overlook the lower-case u+. That happened to two of us on
Tuesday, and it took quite some time for us to figure out what was
actually going on.


> 3) Punycode can be used in different ways, on mixed strings, on lc
>    strings that still contain the original casing info, and on pure
>    lc strings.  Maybe there should be separate examples for all these
>    three uses.

Long ago some of my drafts included explanations of various scenarios,
some of which are not applicable to domain names.  Since then, we
agreed that the Punycode draft should present itself as a piece of IDNA
which, by the way, could perhaps be useful outside IDNA; not as as a
general encoding that happens to be used in IDNA.  The goal was to avoid
confusing implementors of IDNA.  So I cut out all but the essentials of
mixed-case annotation, and confined its description to a single appendix
(and the sample implementation).
I think that basically, that was the right thing to do.


Maybe, if/when there is real interest in using mixed-case annotations
and/or using Punycode for things other than domain names, we could
update the spec, or augment it with a separate document.
There is definitely no interest from my side for using Punycode
for things other than domain names, and therefore no interest
for mixed-case annotations.


Regards,    Martin.