[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Document Status?

To: IETF idn working group <idn@ops.ietf.org>
Subject: Re: [idn] Document Status?
From: "JFC (Jefsey) Morfin" <jefsey@jefsey.com>
Date: Tue, 03 Sep 2002 12:22:48 +0200
In-reply-to: <20020903022433.GD4510@nicemice.net>
References: <5.1.0.14.0.20020902132259.0297a080@mail.jefsey.com><5.1.0.14.0.20020902132259.0297a080@mail.jefsey.com>

Dear Adam,
thank you for your response. I think you perfectly described the problem. I will then make a bore of myself and will list my exiting (increased) concerns.

After reading this mail and explanations which are not in he RFC I still do not know for sure what is intended to be written. You also refers to external knowledge such as "punnycode".

Should we not have a clear terminology section defining every word or concept we are going to use? All I know is semantically "international names" could only means the names which are unaltered by the ACE process. "Here I understand they mean the Unicode scripting which can be ACEd", so should it not be at least be "internationalizable" (no action occurred yet)?

Also I understand that that the RFC deals with the ACE process. Terminology section should therefore

1. define the ACE process
2. define the group of the applications requiring ACE
3. define the international names : left unchanged by the process
4. define the non ACEable names
5. define the ACE labels
6. define the the class of all the labels (ACE and International) which can be processed by an application requiring ACE

On 04:24 03/09/02, Adam M. Costello said:

"JFC (Jefsey) Morfin" <jefsey@jefsey.com> wrote:

> I must say that with my limited French speaking IQ I tried to figure
> out the meaning of "ACE" in the proposed text: sorry, but I was
> totally unable to grasp it.

It is defined in the terminology section:

    An "internationalized label" is a label composed of characters from
    the Unicode character set; note, however, that not every string of
    Unicode characters can be an internationalized label.

to me this is an Unicode label/script? No action occurred yet?

That much is clear, yes?

    To allow internationalized labels to be handled by existing
    applications, IDNA uses an "ACE label" (ACE stands for ASCII
    Compatible Encoding), which can be represented using only ASCII
    characters but is equivalent to a label containing non-ASCII
    characters.

IMHO, let stop at "ACE label". And let define what an ACE label, without "can" which implies there would be other ways.

In other words, internationalized labels can contain non-ASCII
characters, which can't be handled directly by existing applications
that expect domain labels to be ASCII.  Therefore, we instead use an
"ACE label", which is an ASCII label that is equivalent to a non-ASCII
label.

All this added explanation is external and only adds to the text. Let us try to compact it into one single initial crystal clear definition?

   More rigorously, an ACE label is defined to be any label that the
    ToUnicode operation would alter.

So an ACE label is here defined negatively. Feeling is that it means "when ToUnicode will fail". When we mean that the ToUnicode (sucessfully) transform into an ACE label.

That one sentence is the full and exact rigorous definition of the term
"ACE label".  The rest of the explanation is there only to provide
intuition.

    For every internationalized label that cannot be directly
    represented in ASCII, there is an equivalent ACE label.  An ACE
    label always begins with the ACE prefix defined in section 5.

My first reading was puzzling: "whenever the ACE process will not work, there will be an pre-existing equivalent ACE label". This obviously does not make any sense.

Those are clear, yes?

By the way, the notion of "equivalent label" is also defined in the
terminology section:

    In IDNA, equivalence of labels is defined in terms of the ToASCII
    operation, which constructs an ASCII form for a given label.

this means ACE label.

   Labels
    are defined to be equivalent if and only if their ASCII forms
    produced by ToASCII match using a case-insensitive ASCII comparison.

Then International names - ie non modified names by the ACE process (now reduced to ToASCII only(?), should it not also be reversible and the ToUNICODE results in the original scripting?) - cannot be compared (I know it is wrong, but this is what I read here. Again sorry for my Frenglish, but I think here it helps).

    Traditional ASCII labels

What is "traditional". Has it been defined?

    already have a notion of equivalence: upper
    case and lower case are considered equivalent.  The IDNA notion of
    equivalence is an extension of the old notion.

Old notion? Is that the correct wording? Is that not the DNS current and stable notion?

  Equivalent labels in IDNA

Unicode scriptings and ACE label + non modified scriptings.

    are treated as alternate forms of the same label, just as "foo"
    and "Foo" are treated as alternate forms of the same label.

Is that clear enough?

Sorry to be boring. My point is not that I do not understand, but that the reading seems confusing. I only try to help it to be clearer from my own personal reading difficulties.

Getting back to "ACE", maybe some examples would help:

The Japanese phrase <sono><supiido><de> (pretend I wrote it using kana,
which are non-ASCII characters) could be an internationalized label.  It
is not an ACE label, because it cannot be represented in ASCII.

Well I thought that ACE label resulted from the ACE process and not the labels left identical by the ACE process (IMHO both ways: "iesg---name.com" is perfect ASCII, but if ToUNICODEd it will have no meaning).

If you feed it to ToUnicode, it will not be altered, because the check for the ACE prefix will fail.

If the registering process has prevented the creation of the "iesg--" names. This is probably forbidden but it should be mentioned. Because when someone is going to use IDNA for a database entry, that filtering must be implemented in a consistent way.

There exists a label equivalent to <sono><supiido><de> that can be
represented in ASCII, namely IESG--d9juau41awczczp (where IESG-- means
the ACE prefix, whatever is eventually chosen).  This is an ACE label
because it can be represented in ASCII and it is equivalent to a label
containing non-ASCII characters.  If you feed IESG--d9juau41awczczp to
ToUnicode, it will be altered (it will become <sono><supiido><de>).

The label helloworld is not an ACE label, because it is not equivalent
to any non-ASCII label.  If you feed it to ToUnicode, it will not be
altered, because the check for the ACE prefix will fail.

see above about iesg--helloworld necessary filter reminder (please remember VRSN problems with ASCII preregistrations of iDNs).

Those are the three normal cases.  There are also a few corner cases,
labels that begin with the ACE prefix but are not ACE labels:

The label IESG--foo-bar-2 is not an ACE label, even though it begins
with the ACE prefix, because it is not equivalent to any non-ASCII label
(because the Punycode part is invalid).

The punnycode is no part of the document and should be introduced.

If you feed it to ToUnicode, it will not be altered, because the Punycode decoding step will fail.

The label IESG--3ba is not an ACE label, even though it begins with the
ACE prefix and the Punycode part is valid, because it is not equivalent
to any non-ASCII label (because it is not nameprepped;

is namepreparation part of the process. If yes it has to be included in the ACE or ToASCII conditions above. If not this restriction does not apply as such, it can only be noted. There may be a lot of variations in the Unicode scripting the users/developers may want to result in the same ACE_label.

it decodes to a
capital A with grave accent).  If you feed it to ToUnicode, it will not
be altered, because the comparison in step 7 will fail.

You understand that this is as long as DNS does not support à but that other applications may. The definition given above talking about "existing applications" - there are a lot of existing application supporting it, and at any given time in the future there will be more.

So it means that we also want to define the ACE character set.
jfc

Follow-Ups:
- Re: [idn] Document Status?
  - From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>
- Re: [idn] Document Status?
  - From: "James Seng" <jseng@pobox.org.sg>

References:
- Re: [idn] Document Status?
  - From: "JFC (Jefsey) Morfin" <jefsey@jefsey.com>
- Re: [idn] Document Status?
  - From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>

Prev by Date: Re: [idn] Cut and Paste?
Next by Date: Re: [idn] Document Status?
Previous by thread: Re: [idn] Document Status?
Next by thread: Re: [idn] Document Status?
Index(es):
- Date
- Thread