[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Punicode: Upper-case in example



Hello Adam,

Many thanks for your quick response.

At 22:50 02/11/26 +0000, Adam M. Costello wrote:
Martin Duerst <duerst@w3.org> wrote:

> In http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-03.txt,
> example (I) says:
>
>  (I) Russian (Cyrillic):
>         U+043F u+043E u+0447 u+0435 u+043C u+0443 u+0436 u+0435 u+043E
>         u+043D u+0438 u+043D u+0435 u+0433 u+043E u+0432 u+043E u+0440
>         u+044F u+0442 u+043F u+043E u+0440 u+0443 u+0441 u+0441 u+043A
>         u+0438
>         Punycode: b1abfaaepdrnnbgefbaDotcwatmq2g4l
>
> The presence of the upper-case 'D' (not to say the string 'Dot' :-)
> is confusing, because it seems completely arbitrary.  There is no
> upper-case letter in the Cyrillic string.

> How did the upper-case D get in there?

It corresponds to the uppercase U in one of the code points in the u+
notation.  The sample Punycode implementation uses the case of the u
as a 1-bit annotation.
I see. I don't think this is a very good idea to use the U+ for
distinction, for the following reasons:

1) The u+ -> lower case, U+ -> upper case is not documented anywhere
   in the punycode draft (or at least I didn't find it). If used at
   all, it should be documented straight at the start of the examples.

2) The above convention is very easy to overlook, in particular because
   u+ and U+ look so very similar. It is close to a widely established
   convention, but differs slightly.

3) Punycode can be used in different ways, on mixed strings, on
   lc strings that still contain the original casing info, and
   on pure lc strings. Maybe there should be separate examples
   for all these three uses.

Regards,   Martin.