[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] IDNA text presentation (was Document Status?)



Dear Patrick,
you say:
<quote>
Only on explicit request from the AD, which then have noticed IESG that "a
new version is coming".
I have, as document co-editor, not seen any such request from the AD.
paf
</quote>

I understand this and I am sorry for pushing for this review, very late. May I runderline that that this thread is "Document Status". It was about sending it to the IESG. It is not closed and is still very active on several points?

IMHO the text and concepts must be worded to support further developments in a stable direction, approved as a common draft by all the involved parties and a woring directin proposed to the pending points. This to avoid a possibly international split, the appearance of alternative propositions, and at least endless disputes and delays

I am sorry to be bluntly active. As long as you sorted the technial solution and the immediate process was in order it seemed OK to trust the effort. Now there are two possiblities:
- either I am wrong, I am the only one thinking the text confused and complex? Then I apologize.
- or I am right and the text should be recalled, worked out as a text, get approved by the Internet community (after all this is telling how the US Internet is going to be everyone's Internet and the adopted compromises accepted as the solution cannot even fully support French).

Open question?

jfc

==================================

Dear Adam,
thank you for the reply. As James Seng says:

<quote>
Someone once told me "You know you are done when you couldn't find things to
take out". You dont complete document by adding more stuff. You complete it by taking
out the noise and amplify the signal.
</quote>

This is the primary target. The second one IMHO is to phrase the document in such a way it is clearly understood that there are two layers.
- to use an ASCII subset as universal (international) support
- a second to manage users/applications requirements on top of it and not through further additions.

This way the users may decide on a case per case basis to use or not the ACE. We do not build a transition, and we do not prevent alternatives and innovations.


At 03:51 04/09/02, Adam M. Costello wrote:

Disclaimer:  In a few places in this message, when I ask if an alternate
phrasing would be less confusing, I am not offering to make changes to
the draft.  Paul and Patrik and I would have to discuss it, and I would
understand if they think it's too late for that.

"JFC (Jefsey) Morfin" <jefsey@jefsey.com> wrote:

> You also refers to external knowledge such as "punnycode".

Punycode (one n).  It is used only inside the ToASCII and ToUnicode
operations.  If you are implementing those operations, or trying to
understand their internals, you will need to refer to the Punycode spec
and the Nameprep spec.  Otherwise, you can regard ToASCII and ToUnicode
as abstract operations, and still understand IDNA.
I know. What I underline is that you have to define the word - at least as in a related document - before using it.

> Should we not have a clear terminology section defining every word or
> concept we are going to use?

Yes.  That is what we intend section 2 to be.

> All I know is semantically "international names" could only means the
> names which are unaltered by the ACE process.

Which ACE process?  There are two operations, ToASCII and ToUnicode,
which do very different things.  What they do is stated in the first
paragraphs of the sections where they are defined (4.1 and 4.2):

    The ToASCII operation takes a sequence of Unicode code points that
    make up one label and transforms it into a sequence of code points
    in the ASCII range (0..7F). If ToASCII succeeds, the original
    sequence and the resulting sequence are equivalent labels.

    The ToUnicode operation takes a sequence of Unicode code points that
    make up one label and returns a sequence of Unicode code points. If
    the input sequence is a label in ACE form, then the result is
    an equivalent internationalized label that is not in ACE form,
    otherwise the original sequence is returned unaltered.
I read three times that definitions. I fail to see how they are not balanced from ASCII to Unicode and from Unicode to ASCII, except that "iesg--" is removed by ToUnicode? Or is there other hidden cases I fail to see. IMHO a simple clear wording is prevented by the use of complex conditional concepts. If we only talked consistently of real things: scripts, ASCII scripts, aced ASCII scripts it would be simpler and would remove the feeling there are a lot of possible exceptions we do not think of.

(I do not propose a wording here, and further on, I just try to phrase in poor English, the way we could KISS it).

We want a clear, simple, terse text. Written for people using "Unicode scripts" all the day long in their day to day life. Not as if this was an exception. The exception (which permitted the development of the Internet) is the true "international names", ie the "late upper case roman character set plus figures". A to Z, 0 to 9, "-" and ".". That character set must first be given a simple name. As it is an Unicode subset, its specific used is flagged ("iesg--"). All is here. As it is the smallest common writing subset as Vint noted it, I will name it International character set.

IMHO we do not need anything more complex.

> "Here I understand they mean the Unicode scripting which can be ACEd",
> so should it not be at least be "internationalizable" (no action
> occurred yet)?

Various arguments can be made that we are misusing the word
"internationalized".  For better or worse, we use it to mean roughly
"able to support the use of non-ASCII characters".  In any case, we give
definitions for "internationalized domain name" and "internationalized
label", so you can forget any preconceived ideas about what
"internationalized" ought to mean, and trust the definitions.
I am sorry but in my day to day live, as 90% of the people I use Unicode words and a language. I have difficulties communicating and understanding texts where the meaning is opposite to the commonly adopted meaning. We are not talking here of the concepts: we all agree upon. We are talking about them being described and understood.

"Internationalized" means that something occurred.
"able to" means that something could occur.

The result is something opposed. On one case it means that the transformation occurred. In the other case it means that the transformation has not occurred.

> Also I understand that that the RFC deals with the ACE
> process. Terminology section should therefore
>
> 1. define the ACE process

I'm not sure what you mean by "the ACE process".  There are two
operations, ToASCII and ToUnicode, which are complex enough to warrant
their own section (section 4).
No. There is also the preparation of these two operations which makes them possible.

Globally that process must be reversible. So ToUnicode and ToASCII cannot be separated.
If these processes are complex and different, their description can be consistent (and must be consistent as they are to be reversible) if the inputs/outputs are consistently described as indicated above.

> 2. define the group of the applications requiring ACE

I think what is really needed is not the set of applications requiring
ACE, but rather the set of places where ACE is needed.  That is already
there in the terminology section, under "IDN-unaware domain name slot".
Requirement 2 of section 3 (3.1 in the forthcoming idna-11 draft) states
that only ASCII characters are permitted in IDN-unaware domain name
slots.
I am just listing what is needed and where for reading simplicity. I am not criticizing. The target is to review that text for brevity, clarity, logic and consistency with reality. For peoples who know that ASCII is the exception and that reality is much richer.

> 3. define the international names : left unchanged by the process

Which process?  The labels left unchanged by ToASCII are simply the
ASCII labels, so we don't need a special term for that.  The labels left
unchanged by ToUnicode are the non-ACE labels, so we already have a term
for that.
Again. I am not including new categories. I am trying to have a systematic way to present the concepts used in the document, so the reader understands that when the documents use three different way of describing the same thing it actually speaks of the same thing.

I am trying to help that document to be fluid to readers from China, France, Pakistan, Korea, Mexico, etc... without them having the feeling the author tried to complexity his solution rather than to simplify their usage.

> 4. define the non ACEable names

I'm not sure what you mean by ACEable.  Every ASCII label is non-ACEable
in the sense that ToASCII will not alter it.  Some ASCII labels are ACE
labels themselves, and some are not.  But all of them are left unchanged
by ToASCII.

For every valid non-ASCII label, there is an equivalent ACE label.

Maybe you are asking for a definition of valid IDNs?  That is given.
I am not asking. I say this is where the "internationalized name" definition - once properly worded into "internationalisable" is to be placed. One shot. And all the other complementary explanations using different words to be removed.

> 5. define the ACE labels

Done.

> 6. define the class of all the labels (ACE and International) which
> can be processed by an application requiring ACE

I don't know what you mean by "an application requiring ACE".  We do
give a definition of IDN.
This is only the place where I suggest to put the information. Applications requiring ACE are the applications we develop this process for. The ones which do not support Unicode or which need simple scripts. The huge number of information and security databases several talked about in here.

> >    An "internationalized label" is a label composed of characters from
> >    the Unicode character set; note, however, that not every string of
> >    Unicode characters can be an internationalized label.
>
> to me this is an Unicode label/script?  No action occurred yet?

An internationalized label is a sequence of Unicode characters.  Given
a Unicode string, can it be an internationalized label?  The answer is
provided in the definition of IDN:

    An "internationalized domain name" (IDN) is a domain name for which
    the ToASCII operation (see section 4) can be applied to each label
    without failing.

"can be applied", not "has been applied".  A Unicode string X can be an
internationalized label if and only if ToASCII(X) does not fail.

Hmmm, one fact about internationalized labels is that every ASCII label
that can be used in ASCII domain names
BTW, would "International label" instead of "ASCII label that can be used in ASCII domain names" not look more fluid, clearer, general and open to evolution). We could then scrap the rest of the explanation which is not necessary.

is also an internationalized
label that can be used in internationalized domain names.  In other
words, the set of internationalized labels is an extension (superset) of
the set of valid ASCII labels.  Would it have been helpful if this fact
were stated more prominently?
I agree, understanding that you mean the opposite to what anyone not knowing you are discussing IDNA would understand?

I fully accept that living with this project make you to consider your wording as adapted. My only target is to make you aware that this dialog of ours, will be the "dialog" of most of the readers with that text. You will have IDNA "gurus" having (often partly) understood the special meanings of thsi document, and others who will be condused and say that the solution is confused ... developing their own and on top of their own.

> >    To allow internationalized labels to be handled by existing
> >    applications, IDNA uses an "ACE label" (ACE stands for ASCII
> >    Compatible Encoding), which can be represented using only ASCII
> >    characters but is equivalent to a label containing non-ASCII
> >    characters.
>
> IMHO, let stop at "ACE label". And let define what an ACE label,
> without "can" which implies there would be other ways.

I think the word "represented" is confusing you.  In my mind, "Foo"
cannot be represented using only lowercase ASCII letters, but "foo" is
an equivalent label that can be represented using only lowercase ASCII
letters.  Similarly, <sono><supiido><de> cannot be represented using
only ASCII characters, but IESG--d9juau41awczczp is an equivalent label
that can be represented using only ASCII characters.

Unfortunately, the document is not consistent in its usage of the word
"represent".  Would this sentence be less confusing if worded as:

    ...IDNA uses an "ACE label" (ACE stands for ASCII Compatible
    Encoding), which is composed of ASCII characters but is equivalent
    to a label containing non-ASCII characters.
Let get my logical understanding of what was agreed in here (wording may oppose yours, but I do think we are in agreement):

1. there are standard names. They can only be written in Unicode set.
2. most of them can also be written in International set (0-9,A-Z,"-", ".") using ACE.
3. some are already International, because they only use characters from the International set.
4. there is a need to know which writing is used: the "iesg--" header tells that the writing is international.

The process permitting to translate Unicode writing in International writing is ToASCII (should be ToINTL) and the process to translate international writing to Unicode is ToUnicode. The ACE process consists in producing simultaneously and both ways the International and the Unicode writing of every name, with exceptions which have no International writing.

The names may include several labels. They have a semantic which calls for a preparation to make sure the ACE process will result in clear and reversible International writing.

In real life there are applications which need International writing (8bits) and others which can use standard writing (16 bits or more). All this because the human being, the current screen, keyboard system and current software applications do not support more than 8bits keys, as today.

So it is convenient for some applications to use an 8bits limited common character system.

I will take an example. We started the namespace with the International character set. As Vint quoted it for regular mail, we used uppercase what made clear that accents could not be used. TLDs for USA were the IRCs (ITT, RCA etc...) and names were such as RCAFORDSALES (no separation between rootname and name).

Then we switched to X.75 and X.121 to be compatible with ISO standards limited to telephone like digital logic (as the DNS is International Character set oriented). We did it the same way IDNA, using numeric names. We had a namepreparation and "ToISO" and "ToTymnet" function to translate "FRAELYSEESECRETARY" into the "208075081231" name and then into 208007508123. The same, we transformed "ARPAUCLA" or "123.23.28.82" into "ucla.arpa", but accepted "90777" as an US X.25 call to Dialog. We had no "iesg--" flag, we used either the final "." from Internet to indicate it was to be reversed or the numeric structure.

This type of parallel encoding is very common. You can write Japanese words into two character sets. There is no logical difference between IDNA and Katanaka. This process is basic in French where the upper cases may carry not accent nor cedilla. So "Jean-François" (my forename) is Unicode and "JEAN-FRANCOIS" is international. So everyone think "IDNA" all the day long. Hence the difficulty to even understand there is a (complex) There is just a (complex) simplifcation.

But I accept that this is a big thinking for people accustomed to use International names only. The same problem as translating English "you" in other languages: does it means "vous" or "tu".

?
That's less precise (because there are non-ASCII characters that can
be represented by ASCII characters, like the fullwidth characters
FF01..FF5E), but I suppose the imprecision might be tolerable, since the
rigorous definition immediately follows.

> All this added explanation is external and only adds to the text.  Let
> us try to compact it into one single initial crystal clear definition?

The only crystal clear definition is this one:

    More rigorously, an ACE label is defined to be any label that the
    ToUnicode operation would alter.
This is a loop definition. I talk about what I talk. It should be defined as the assignement to the WG. An ACE label is a label only containing 8bits characters of the "Late Roman Character set" or of "International" character set as specified above. The WG has the mission to make possible every stanard real names to be writen with that character set.

However, here's another attempt at providing some intuition:  All
non-ASCII internationalized labels are intended to be displayed to
users without change.  The same is true of most ASCII labels.  But
there are some ASCII labels, called ACE labels, that are not intended
to be displayed directly to users.  ACE labels begin with the ACE
prefix and look like gibberish.
(except if they come from an International character set string).

Every ACE label is equivalent to a
non-ASCII label, which is what is intended to be displayed instead.
For every non-ASCII internationalized label there is an equivalent ACE
label.  [I'm being loose with the term "ASCII".  Unicode characters that
are compatibly equivalent to ASCII characters (like those fullwidth
characters) count as ASCII for the purposes of this paragraph.]

> >   More rigorously, an ACE label is defined to be any label that the
> >   ToUnicode operation would alter.
>
> So an ACE label is here defined negatively.  Feeling is that it means
> "when ToUnicode will fail".

I don't see what's negative about that definition.  I see no mention of
failure.  Indeed, the second paragraph of the ToUnicode section (4.2)
says "ToUnicode never fails."
I do not mean in technical terms (if we can discuss all this it is because the technical issues are settled). I mean in term of reading. When you say A != B you do not really define A. unless you add more information.

So when you read such a definition as "if (A != ToUnicode(A)) True(A)=1, else True(A)=0;
you usually spend a few seconds trying to understand what it really means.

> When we mean that the ToUnicode (sucessfully) transform into an ACE
> label.

ToUnicode can never output an ACE label.  The first paragraph of the
ToUnicode section says:

    If the input sequence is a label in ACE form, then the result is
    an equivalent internationalized label that is not in ACE form,
    otherwise the original sequence is returned unaltered.

Here is a math-notation version of the definition of ACE label:

    { ACE labels } = { X : X != ToUnicode(X) }
yeap. What I meant was: the part of the ACE Label after the ACE prefix. This kind of confusion will be over and over during decades with that wording. Saying there is an Unicode and an International writing and that due to the possible confusion when the Unicode writing only uses International characters, there is an International prefix, would probably be quicker, easier and clearer to understand. I suppose every little French or Japanese would not even understand there might be a problem.

> >    For every internationalized label that cannot be directly
> >    represented in ASCII, there is an equivalent ACE label.
>
> My first reading was puzzling: "whenever the ACE process will not
> work, there will be an pre-existing equivalent ACE label".  This
> obviously does not make any sense.

Again, I think the word "represented" is confusing you, although I
thought the word "directly" would help.  Would it be less confusing if
reworded the same way as before [again using the loose sense of ASCII]:

    For every internationalized label containing non-ASCII characters,
    there is an equivalent ACE label.
The phrase is simpler but the wording is complex and needs a ToIDNA / ToEnglish function.

My wording would be:

For every international writing there a standard writing.
There is not necessarily an international writing for every standard writing.

?

> >    In IDNA, equivalence of labels is defined in terms of the ToASCII
> >    operation, which constructs an ASCII form for a given label.
>
> this means ACE label.

No, ToASCII(helloworld) == helloworld, which is not an ACE label.  The
output of ToASCII is always an ASCII label, but not always an ACE label.
The output is an ACE label only if the input was non-ASCII, or the input
was an ACE label.  [Here again I'm using the loose sense of ASCII.]
Here again you consider the result of the process. What matters is the reality.

1. is it standard writing (Unicode, 16 bits)
2. is it international writing (International, 8 bits)
3. or an Unicode writing using Interational characteirs only: the prefix tells it..

> >   Labels are defined to be equivalent if and only if their ASCII
> >   forms produced by ToASCII match using a case-insensitive ASCII
> >   comparison.
>
> Then International names - ie non modified names by the ACE process
> (now reduced to ToASCII only(?), should it not also be reversible and
> the ToUNICODE results in the original scripting?) - cannot be compared
> (I know it is wrong, but this is what I read here.

That does not follow.  The definition of equivalence says:

    For all X,Y  define  X ~ Y  to mean  ToASCII(X) ~ ToASCII(Y)

You are wondering about "non modified names", that is, labels Z for
which Z == ToASCII(Z).  That does not impede us from performing the
comparison.  We compute ToASCII(X) (which returns X), and we compute
ToASCII(Y) (which returns Y), and we compare the results using a
case-insensitive ASCII comparison (which we can do because the output of
ToASCII is an ASCII string).
I agree. What I said is what I tend to read.

> >    Traditional ASCII labels
>
> What is "traditional".  Has it been defined?

Just a regular English word, not a term.  "Traditional ASCII labels"
means "the ASCII labels that we've all grown familiar with over the
years".
I am sorry, 90% of the mankind never grown familiar with any ASCII labels over the years. 90% of the mankind executes a double IDNO process to relate with you through a ToEnglish and a ToASCII operation.

Anyway, you cannot use the word "traditional" without defining the RFC where to find the table. When it comes to writing a program there is no C traditional(char *) routine.

> >    already have a notion of equivalence: upper case and lower case
> >    are considered equivalent.  The IDNA notion of equivalence is an
> >    extension of the old notion.
>
> Old notion?  Is that the correct wording?  Is that not the DNS current
> and stable notion?

Yes, the current notion is old, having been with us for many years.
Would "older notion" be less alarming?
My objection was to the non definition of "old". For example there is a much older Roman notion about J and I being the same. An old military and Minitel user notion of a 34 character set where 0 and O and 1 and I are used the same.

The simpler is to define the character set (0-9, A-Z, "-","."). Maybe adding "@" and "_" (?).

> > The Japanese phrase <sono><supiido><de> (pretend I wrote it using
> > kana, which are non-ASCII characters) could be an internationalized
> > label.  It is not an ACE label, because it cannot be represented in
> > ASCII.
>
> Well I thought that ACE label resulted from the ACE process and
> not the labels left identical by the ACE process (IMHO both ways:
> "iesg---name.com" is perfect ASCII, but if ToUNICODEd it will have no
> meaning).

ToUnicode will not alter the label IESG---name (remember that ToUnicode
takes a single label as input).  The output will be IESG---name.  In
this particular case, the Punycode decoding step will fail.  But see
below for a more interesting example, IESG--world.
Again the difference between the label, the ACE label and the after the prefix label. There should be a word for the last meaning.

But you are true - as noted below the "iesg-world" is not an internationalisable name. It should be mentioned in the RFC.

> > If you feed it to ToUnicode, it will not be altered, because the
> > check for the ACE prefix will fail.
>
> If the registering process has prevented the creation of the "iesg--"
> names.

I was speaking of <sono><supiido><de>, which does not begin with the ACE
prefix.  It's true that if someone registers IESG--world, it will be
displayed by IDNA-conformant applications as three Chinese characters
(U+53DF U+53E0 U+53D9), because IESG--world is an ACE label.  Maybe
the user really wants a label that will display as "IESG--world", in
which case the user will be dismayed at the behavior of IDNA-conformant
applications.  Or maybe the user really intends to register the Chinese
label, but used the ACE form in the registration process because the
browser wouldn't let them enter Chinese characters, or because a clever
clipboard decided to paste the ACE form, or because the registrar has
not yet upgraded to support IDNA.  Those are all legitimate scenarios;
people who compute the ACE form themselves and register it directly
rather than relying on the registrar to compute it might in fact know
what they're doing.  Registrars that support IDNA and receive ACE forms
as input would probably do well to display them back in both ACE and
non-ACE forms and ask for confirmation.
You may also think about the thousands of key business names registered that way at Verisign. Unless there is no provision made that such names illegal - but on which ground the IETF is going to chose a 2 letters label (which can be confused with an existing or future ccTLD) to make its use illegal? This is why the whole story should be wrapped up asap with the support of the Minc, the Ainc and Eurolinc, main language authorities and we can proceed before UNESCO and ITU puts it at the WSIS agenda.

When several labels are equivalent, you can regester any one of them
and you own them all.  If I register "example.com" or "Example.com"
or "EXAMPLE.com", no one else can register any of them, they're all
mine.  Similarly, no matter whether I register <sono><supiido><de> or
IESG--d9juau41awczczp, no one else can register either one, they're both
mine.
That is the business fun!
and the International cybersquatting!!!
ask Verisign.

> The punnycode is no part of the document and should be introduced.

It is mentioned at the end of section 1, and in the steps of ToASCII,
and in steps ToUnicode.  Nameprep is mentioned in exactly the same
places, and nowhere else.  You don't need to understand them to
understand IDNA, you can think of ToASCII and ToUnicode as abstract
operations.
True but when you peruse a document and you find a word you ignore, you look at the glossary. This is simple document presentation. But improves readability a lot and avoid endless questions/disputes.

> > The label IESG--3ba is not an ACE label, even though it begins with
> > the ACE prefix and the Punycode part is valid, because it is not
> > equivalent to any non-ASCII label (because it is not nameprepped;
>
> is namepreparation part of the process. If yes it has to be included
> in the ACE or ToASCII conditions above. If not this restriction
> does not apply as such, it can only be noted. There may be a lot of
> variations in the Unicode scripting the users/developers may want to
> result in the same ACE_label.
>
> > it decodes to a capital A with grave accent).  If you feed it to
> > ToUnicode, it will not be altered, because the comparison in step 7
> > will fail.
>
> You understand that this is as long as DNS does not support &agrave;
> but that other applications may. The definition given above talking
> about "existing applications" - there are a lot of existing
> application supporting it, and at any given time in the future there
> will be more.
>
> So it means that we also want to define the ACE character set.

I don't understand what your concern is here.

IESG--3ba is not an ACE label because it is not equivalent to
any non-ASCII label.  We have defined equivalence.  IESG--3ba is
equivalent to X if and only if ToASCII(IESG--3ba) matches ToASCII(X).
ToASCII(IESG--3ba) is IESG--3ba, because ToASCII does not alter ASCII
labels.  So we're looking for a non-ASCII label X such that ToASCII(X)
is IESG--3ba.  There is no such X.  If you want to understand why there
is no such X, you'll need to examine the internal details of ToASCII,
and then you'll discover that ToASCII performs Nameprep, and Nameprep
can never output a capital A with grave (and to understand why that is
so, you'll need to examine the internals of Nameprep, and then you'll
discover that Nameprep performs case-folding).
But you will have to explain many French people why after three years work and the ability to support Chinese, their standard typing cannot be used and to some why they could not register their name. The Marquis d'Ö will be upset at that. Many keyboards supports accents separately from the letter and can put a grave at any character as required by the proper scripting of the language.

By the way, even though there is no ACE label that decodes to a capital
A with grave, that doesn't mean ToASCII can't accept a capital A with
grave as input.  It can, and Nameprep will map it to small a with
grave.
This is a wrong preparation process.

The output of ToASCII will be IESG--0ca, which is an ACE label,
which ToUnicode will transform into small a with grave.  Notice that
ToUnicode(ToASCII(X)) is not always X, it is Nameprep(X) (which is
equivalent to X).
Absolutely true. This is why the ACE process cannot be defined in relation to the ToASCII/ToUnicode functions only.

And here is the entire problem: toASCII/ToUnicode in the case of the DNS should be DNS.2 integrated tools at inter-application level. While nameprep is an extended service above the DNS layer (DNS+). The reason why is that nameprep may include many more functions related to input/output massaging and can be customized (like accepting abbreviated entries) at application level.

Another way to see that IESG--3ba is not an ACE label is to observe
that ToUnicode does not alter it.  If you want to understand why
ToUnicode does not alter it, you'll need to examine the internal details
of ToUnicode, and then you'll discover that ToUnicode decodes it to
capital A with grave, then applies ToASCII, which produces IESG--0ca (as
mentioned above), and then ToUnicode notices that IESG--0ca does not
match IESG--3ba, and so ToUnicode returns the original input.
True, but this is the same as my acceptation of "90877" as the US abbreviation for the X.121 address of Dialog instead of requiring "310690877" where "310" is the USA international prefix and 3106 the Tymnet DNIC. Is the ACE process global or not (including only ToUnicode/ToASCII)?

National consistency makes me feel that it should be semi-global ie including customizable nameprep. a) If you look at you word; you will see that you different blend of English, French. b) we will probably support national root views soon enough and national DNS options will probably develop. They might also be supported there.

IMHO the whole problem is to make sure that the wording is clear enough to protect the toASCII and ToUnicode processes as usunversal and permanent (otherwise we cannot industialize and export). To be sure they can be adapted at uppser level to respect the real life needs (like grave capital A?). And to support extended functions in a way not endangering the stability of the process and the stability of the applications (abreviatinos, synonisms, etc..)

Also, let remember this is no temporary patch. This is way of writing real words in using a minimum character set. That character set and ToAscii and ToUnicode functions will most probably be engraved in silicon chips and used worldwide in many applications from Keyboards to telephone etc. as providing a larger number of alternative than digital (0-1) and numeric (0-9) encoding. We must make sure we protect that universal industrial solution, but do not limit our capacity for innovation.

Thank you for your time/
jfc