[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Punicode: Upper-case in example



Martin Duerst <duerst@w3.org> wrote:

> > The sample Punycode implementation uses the case of the u as a 1-bit
> > annotation.
>
> I see. I don't think this is a very good idea to use the U+ for
> distinction, for the following reasons:
>
> 1) The u+ -> lower case, U+ -> upper case is not documented anywhere
>    in the punycode draft (or at least I didn't find it). If used at
>    all, it should be documented straight at the start of the examples.

It is not documented in the spec because it is not a feature of
Punycode.  The Punycode algorithm inputs and outputs code points, which
are numbers.  It does not input or output "u+".

The sample implementation inputs and outputs "u+".  Therefore the use
of the u as a 1-bit annotation is mentioned in the documentation of the
sample implementation, which is embedded in the source code (you can
either read the source code of the usage() function, or run the program
with no arguments).

I tried to downplay mixed-case annotation as much as possible in the
draft, because Paul and Patrik have never liked it.  I'll ask them if
they think the Examples section should call attention to the mixed-case
annotations.

> 2) The above convention is very easy to overlook, in particular
>    because u+ and U+ look so very similar.  It is close to a widely
>    established convention, but differs slightly.

I'm curious--can you explain or point me to that convention?

> 3) Punycode can be used in different ways, on mixed strings, on lc
>    strings that still contain the original casing info, and on pure
>    lc strings.  Maybe there should be separate examples for all these
>    three uses.

Long ago some of my drafts included explanations of various scenarios,
some of which are not applicable to domain names.  Since then, we
agreed that the Punycode draft should present itself as a piece of IDNA
which, by the way, could perhaps be useful outside IDNA; not as as a
general encoding that happens to be used in IDNA.  The goal was to avoid
confusing implementors of IDNA.  So I cut out all but the essentials of
mixed-case annotation, and confined its description to a single appendix
(and the sample implementation).

Maybe, if/when there is real interest in using mixed-case annotations
and/or using Punycode for things other than domain names, we could
update the spec, or augment it with a separate document.

AMC