[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] IDNA text presentation (was Document Status?)

Subject: Re: [idn] IDNA text presentation (was Document Status?)
From: Soobok Lee <lsb@postel.co.kr>
Date: Thu, 5 Sep 2002 01:52:38 +0900
Cc: IETF idn working group <idn@ops.ietf.org>, "JFC (Jefsey) Morfin" <jefsey@jefsey.com>
In-reply-to: <019801c25422$1c2a9850$721a0bca@JAMESSONYVAIO>
References: <5.1.0.14.0.20020903113948.02433ec0@mail.jefsey.com> <5.1.0.14.0.20020902132259.0297a080@mail.jefsey.com> <5.1.0.14.0.20020902132259.0297a080@mail.jefsey.com> <5.1.0.14.0.20020903113948.02433ec0@mail.jefsey.com> <5.1.0.14.0.20020904104152.02978ec0@mail.jefsey.com> <019801c25422$1c2a9850$721a0bca@JAMESSONYVAIO>
User-agent: Mutt/1.4i
Personally, i appreciate Jefsey's, as one of regular @LARGE member, showing
interests in tough IDN technical details and its implications to end user 
experiences.  I hope more @LARGE members to come here and look into various 
aspects of IDN deployments and its cost and social effect issues that are 
relevant to consumer interets.

As far as i know, this WG (and the chairs) had been originated from IDN WG of 
MINC which was a consortium of commercial IDN vendors. This WG , as i feel 
for one year of regular participation, has been biased in favor of 
commercial IDN vendors and their agendas. of course, some participants have
stand on the comsumer/end user side and tried to neutralize this WG.

Don't be disappointed at hearing "out of scope!" sometime from someone. When DNS
and the internet came out from ISI labs and later from US boundaries, DNS become
the internationally shared, unique and precious resources. Every internet
user in the world has sole rights to see mature and long-term safe 
solutions for IDN, because they paid for the internet and its standards technologies.

Fortunately, this WG discussion is open and I believe IESG and IAB would
oversee IDN standardization for the interest of the public, not for some
commericial driving forces of IDN.

Soobok Lee


On Wed, Sep 04, 2002 at 10:48:13PM +0800, James Seng wrote:
> Most are threads here are over (a) the content (b) the direction (c) the
> abstraction (d) the technical specification etc. Many of them are religious
> in nature, an
> 
> Your thread, however is the only one I seen (or heard) from all who has read
> the draft to say they dont understand it.
> 
> So either the draft is really hard to understand and all those who are read
> it (including non-english speaker) have either somehow able to understand it
> or keep silent, or there might some other problem with the only one who
> fails to understand it.
> 
> Therefore, I think we should drop this thread as I do not seen any rough
> consensus that there is anything unclear about the specification.
> 
> -James Seng
> 
> ----- Original Message -----
> From: "JFC (Jefsey) Morfin" <jefsey@jefsey.com>
> To: "IETF idn working group" <idn@ops.ietf.org>
> Sent: Wednesday, September 04, 2002 7:13 PM
> Subject: [idn] IDNA text presentation (was Document Status?)
> 
> 
> > Dear Patrick,
> > you say:
> > <quote>
> > Only on explicit request from the AD, which then have noticed IESG that "a
> > new version is coming".
> > I have, as document co-editor, not seen any such request from the AD.
> > paf
> > </quote>
> >
> > I understand this and I am sorry for pushing for this review, very late.
> > May I runderline that that this thread is "Document Status". It was about
> > sending it to the IESG. It is not closed and is still very active on
> > several points?
> >
> > IMHO the text and concepts must be worded to support further developments
> > in a stable direction, approved as a common draft by all the involved
> > parties and a woring directin proposed to the pending points. This to
> avoid
> > a possibly international split, the appearance of alternative
> propositions,
> > and at least endless disputes and delays
> >
> > I am sorry to be bluntly active. As long as you sorted the technial
> > solution and the immediate process was in order it seemed OK to trust the
> > effort. Now there are two possiblities:
> > - either I am wrong, I am the only one thinking the text confused and
> > complex? Then I apologize.
> > - or I am right and the text should be recalled, worked out as a text, get
> > approved by the Internet community (after all this is telling how the US
> > Internet is going to be everyone's Internet and the adopted compromises
> > accepted as the solution cannot even fully support French).
> >
> > Open question?
> >
> > jfc
> >
> > ==================================
> >
> > Dear Adam,
> > thank you for the reply. As James Seng says:
> >
> > <quote>
> > Someone once told me "You know you are done when you couldn't find things
> to
> > take out". You dont complete document by adding more stuff. You complete
> it
> > by taking
> > out the noise and amplify the signal.
> > </quote>
> >
> > This is the primary target. The second one IMHO is to phrase the document
> > in such a way it is clearly understood that there are two layers.
> > - to use an ASCII subset as universal (international) support
> > - a second to manage users/applications requirements on top of it and not
> > through further additions.
> >
> > This way the users may decide on a case per case basis to use or not the
> > ACE. We do not build a transition, and we do not prevent alternatives and
> > innovations.
> >
> >
> > At 03:51 04/09/02, Adam M. Costello wrote:
> >
> > >Disclaimer:  In a few places in this message, when I ask if an alternate
> > >phrasing would be less confusing, I am not offering to make changes to
> > >the draft.  Paul and Patrik and I would have to discuss it, and I would
> > >understand if they think it's too late for that.
> > >
> > >"JFC (Jefsey) Morfin" <jefsey@jefsey.com> wrote:
> > >
> > > > You also refers to external knowledge such as "punnycode".
> > >
> > >Punycode (one n).  It is used only inside the ToASCII and ToUnicode
> > >operations.  If you are implementing those operations, or trying to
> > >understand their internals, you will need to refer to the Punycode spec
> > >and the Nameprep spec.  Otherwise, you can regard ToASCII and ToUnicode
> > >as abstract operations, and still understand IDNA.
> >
> > I know. What I underline is that you have to define the word - at least as
> > in a related document - before using it.
> >
> > > > Should we not have a clear terminology section defining every word or
> > > > concept we are going to use?
> > >
> > >Yes.  That is what we intend section 2 to be.
> > >
> > > > All I know is semantically "international names" could only means the
> > > > names which are unaltered by the ACE process.
> > >
> > >Which ACE process?  There are two operations, ToASCII and ToUnicode,
> > >which do very different things.  What they do is stated in the first
> > >paragraphs of the sections where they are defined (4.1 and 4.2):
> > >
> > >     The ToASCII operation takes a sequence of Unicode code points that
> > >     make up one label and transforms it into a sequence of code points
> > >     in the ASCII range (0..7F). If ToASCII succeeds, the original
> > >     sequence and the resulting sequence are equivalent labels.
> > >
> > >     The ToUnicode operation takes a sequence of Unicode code points that
> > >     make up one label and returns a sequence of Unicode code points. If
> > >     the input sequence is a label in ACE form, then the result is
> > >     an equivalent internationalized label that is not in ACE form,
> > >     otherwise the original sequence is returned unaltered.
> >
> > I read three times that definitions. I fail to see how they are not
> > balanced from ASCII to Unicode and from Unicode to ASCII, except that
> > "iesg--" is removed by ToUnicode? Or is there other hidden cases I fail to
> > see. IMHO a simple clear wording is prevented by the use of complex
> > conditional concepts. If we only talked consistently of real things:
> > scripts, ASCII scripts, aced ASCII scripts it would be simpler and would
> > remove the feeling there are a lot of possible exceptions we do not think
> of.
> >
> > (I do not propose a wording here, and further on, I just try to phrase in
> > poor English, the way we could KISS it).
> >
> > We want a clear, simple, terse text. Written for people using "Unicode
> > scripts" all the day long in their day to day life. Not as if this was an
> > exception. The exception (which permitted the development of the Internet)
> > is the true "international names", ie the "late upper case roman character
> > set plus figures". A to Z, 0 to 9, "-"  and ".". That character set must
> > first be given a simple name. As it is an Unicode subset, its specific
> used
> > is flagged ("iesg--"). All  is here. As it is the smallest common writing
> > subset as Vint noted it, I will name it International character set.
> >
> > IMHO we do not need anything more complex.
> >
> > > > "Here I understand they mean the Unicode scripting which can be ACEd",
> > > > so should it not be at least be "internationalizable" (no action
> > > > occurred yet)?
> > >
> > >Various arguments can be made that we are misusing the word
> > >"internationalized".  For better or worse, we use it to mean roughly
> > >"able to support the use of non-ASCII characters".  In any case, we give
> > >definitions for "internationalized domain name" and "internationalized
> > >label", so you can forget any preconceived ideas about what
> > >"internationalized" ought to mean, and trust the definitions.
> >
> > I am sorry but in my day to day live, as 90% of the people I use Unicode
> > words and a language. I have difficulties communicating and understanding
> > texts where the meaning is opposite to the commonly adopted meaning. We
> are
> > not talking here of the concepts: we all agree upon. We are talking about
> > them being described and understood.
> >
> > "Internationalized" means that something occurred.
> > "able to" means that something could occur.
> >
> > The result is something opposed. On one case it means that the
> > transformation occurred. In the other case it means that the
> transformation
> > has not occurred.
> >
> > > > Also I understand that that the RFC deals with the ACE
> > > > process. Terminology section should therefore
> > > >
> > > > 1. define the ACE process
> > >
> > >I'm not sure what you mean by "the ACE process".  There are two
> > >operations, ToASCII and ToUnicode, which are complex enough to warrant
> > >their own section (section 4).
> >
> > No. There is also the preparation of these two operations which makes them
> > possible.
> >
> > Globally that process must be reversible. So ToUnicode and ToASCII cannot
> > be separated.
> > If these processes are complex and different, their description can be
> > consistent (and must be consistent as they are to be reversible) if the
> > inputs/outputs are consistently described as indicated above.
> >
> > > > 2. define the group of the applications requiring ACE
> > >
> > >I think what is really needed is not the set of applications requiring
> > >ACE, but rather the set of places where ACE is needed.  That is already
> > >there in the terminology section, under "IDN-unaware domain name slot".
> > >Requirement 2 of section 3 (3.1 in the forthcoming idna-11 draft) states
> > >that only ASCII characters are permitted in IDN-unaware domain name
> > >slots.
> >
> > I am just listing what is needed and where for reading simplicity. I am
> not
> > criticizing. The target is to review that text for brevity, clarity, logic
> > and consistency with reality. For peoples who know that ASCII is the
> > exception and that reality is much richer.
> >
> > > > 3. define the international names : left unchanged by the process
> > >
> > >Which process?  The labels left unchanged by ToASCII are simply the
> > >ASCII labels, so we don't need a special term for that.  The labels left
> > >unchanged by ToUnicode are the non-ACE labels, so we already have a term
> > >for that.
> >
> > Again. I am not including new categories. I am trying to have a systematic
> > way to present the concepts used in the document, so the reader
> understands
> > that when the documents use three different way of describing the same
> > thing it actually speaks of the same thing.
> >
> > I am trying to help that document to be fluid to readers from China,
> > France, Pakistan, Korea, Mexico, etc... without them having the feeling
> the
> > author tried to complexity his solution rather than to simplify their
> usage.
> >
> > > > 4. define the non ACEable names
> > >
> > >I'm not sure what you mean by ACEable.  Every ASCII label is non-ACEable
> > >in the sense that ToASCII will not alter it.  Some ASCII labels are ACE
> > >labels themselves, and some are not.  But all of them are left unchanged
> > >by ToASCII.
> > >
> > >For every valid non-ASCII label, there is an equivalent ACE label.
> > >
> > >Maybe you are asking for a definition of valid IDNs?  That is given.
> >
> > I am not asking. I say this is where the "internationalized name"
> > definition - once properly worded into "internationalisable" is to be
> > placed. One shot. And all the other complementary explanations using
> > different words to be removed.
> >
> > > > 5. define the ACE labels
> > >
> > >Done.
> > >
> > > > 6. define the class of all the labels (ACE and International) which
> > > > can be processed by an application requiring ACE
> > >
> > >I don't know what you mean by "an application requiring ACE".  We do
> > >give a definition of IDN.
> >
> > This is only the place where I suggest to put the information.
> Applications
> > requiring ACE are the applications we develop this process for. The ones
> > which do not support Unicode or which need simple scripts. The huge number
> > of information and security databases several talked about in here.
> >
> > > > >    An "internationalized label" is a label composed of characters
> from
> > > > >    the Unicode character set; note, however, that not every string
> of
> > > > >    Unicode characters can be an internationalized label.
> > > >
> > > > to me this is an Unicode label/script?  No action occurred yet?
> > >
> > >An internationalized label is a sequence of Unicode characters.  Given
> > >a Unicode string, can it be an internationalized label?  The answer is
> > >provided in the definition of IDN:
> > >
> > >     An "internationalized domain name" (IDN) is a domain name for which
> > >     the ToASCII operation (see section 4) can be applied to each label
> > >     without failing.
> > >
> > >"can be applied", not "has been applied".  A Unicode string X can be an
> > >internationalized label if and only if ToASCII(X) does not fail.
> > >
> > >Hmmm, one fact about internationalized labels is that every ASCII label
> > >that can be used in ASCII domain names
> >
> > BTW, would "International label" instead of "ASCII label that can be used
> > in ASCII domain names" not look more fluid, clearer, general and open to
> > evolution). We could then scrap the rest of the explanation which is not
> > necessary.
> >
> > >is also an internationalized
> > >label that can be used in internationalized domain names.  In other
> > >words, the set of internationalized labels is an extension (superset) of
> > >the set of valid ASCII labels.  Would it have been helpful if this fact
> > >were stated more prominently?
> >
> > I agree, understanding that you mean the opposite to what anyone not
> > knowing you are discussing IDNA would understand?
> >
> > I fully accept that living with this project make you to consider your
> > wording as adapted. My only target is to make you aware that this dialog
> of
> > ours, will be the "dialog" of most of the readers with that text. You will
> > have IDNA "gurus" having (often partly) understood the special meanings of
> > thsi document, and others who will be condused and say that the solution
> is
> > confused ... developing their own and on top of their own.
> >
> > > > >    To allow internationalized labels to be handled by existing
> > > > >    applications, IDNA uses an "ACE label" (ACE stands for ASCII
> > > > >    Compatible Encoding), which can be represented using only ASCII
> > > > >    characters but is equivalent to a label containing non-ASCII
> > > > >    characters.
> > > >
> > > > IMHO, let stop at "ACE label". And let define what an ACE label,
> > > > without "can" which implies there would be other ways.
> > >
> > >I think the word "represented" is confusing you.  In my mind, "Foo"
> > >cannot be represented using only lowercase ASCII letters, but "foo" is
> > >an equivalent label that can be represented using only lowercase ASCII
> > >letters.  Similarly, <sono><supiido><de> cannot be represented using
> > >only ASCII characters, but IESG--d9juau41awczczp is an equivalent label
> > >that can be represented using only ASCII characters.
> > >
> > >Unfortunately, the document is not consistent in its usage of the word
> > >"represent".  Would this sentence be less confusing if worded as:
> > >
> > >     ...IDNA uses an "ACE label" (ACE stands for ASCII Compatible
> > >     Encoding), which is composed of ASCII characters but is equivalent
> > >     to a label containing non-ASCII characters.
> >
> > Let get my logical understanding of what was agreed in here (wording may
> > oppose yours, but I do think we are in agreement):
> >
> > 1. there are standard names. They can only be written in Unicode set.
> > 2. most of them can also be written in International set (0-9,A-Z,"-",
> ".")
> > using ACE.
> > 3. some are already International, because they only use characters from
> > the International set.
> > 4. there is a need to know which writing is used: the "iesg--" header
> tells
> > that the writing is international.
> >
> > The process permitting to translate Unicode writing in International
> > writing is ToASCII (should be ToINTL) and the process to translate
> > international writing to Unicode is ToUnicode. The ACE process consists in
> > producing simultaneously and both ways the International and the Unicode
> > writing of every name, with exceptions which have no International
> writing.
> >
> > The names may include several labels. They have a semantic which calls for
> > a preparation to make sure the ACE process will result in clear and
> > reversible International writing.
> >
> > In real life there are applications which need International writing
> > (8bits) and others which can use standard writing (16 bits or more). All
> > this because the human being, the current screen, keyboard system and
> > current software applications do not support more than 8bits keys, as
> today.
> >
> > So it is convenient for some applications to use an 8bits limited common
> > character system.
> >
> > I will take an example. We started the namespace with the International
> > character set. As Vint quoted it for regular mail, we used uppercase what
> > made clear that accents could not be used. TLDs for USA were the IRCs
> (ITT,
> > RCA etc...) and names were such as RCAFORDSALES (no separation between
> > rootname and name).
> >
> > Then we switched to X.75 and X.121 to be compatible with ISO standards
> > limited to telephone like digital logic (as the DNS is International
> > Character set oriented).  We did it the same way IDNA, using  numeric
> > names. We had a namepreparation and "ToISO" and "ToTymnet" function to
> > translate "FRAELYSEESECRETARY" into the "208075081231" name and then
> > into  208007508123. The same, we transformed "ARPAUCLA" or "123.23.28.82"
> > into "ucla.arpa", but accepted "90777" as an US X.25 call to Dialog. We
> had
> > no "iesg--" flag, we used either the final "." from Internet to indicate
> it
> > was to be reversed or the numeric structure.
> >
> > This type of parallel encoding is very common. You can write Japanese
> words
> > into two character sets. There is no logical difference between IDNA and
> > Katanaka. This process is basic in French where the upper cases may carry
> > not accent nor cedilla. So "Jean-Fran?ois" (my forename) is Unicode and
> > "JEAN-FRANCOIS" is international. So everyone think "IDNA" all the day
> > long. Hence the difficulty to even understand there is a (complex) There
> is
> > just a (complex) simplifcation.
> >
> > But I accept that this is a big thinking for people accustomed to use
> > International names only. The same problem as translating English "you" in
> > other languages: does it means "vous" or "tu".
> >
> > >?
> > >That's less precise (because there are non-ASCII characters that can
> > >be represented by ASCII characters, like the fullwidth characters
> > >FF01..FF5E), but I suppose the imprecision might be tolerable, since the
> > >rigorous definition immediately follows.
> > >
> > > > All this added explanation is external and only adds to the text.  Let
> > > > us try to compact it into one single initial crystal clear definition?
> > >
> > >The only crystal clear definition is this one:
> > >
> > >     More rigorously, an ACE label is defined to be any label that the
> > >     ToUnicode operation would alter.
> >
> > This is a loop definition. I talk about what I talk. It should be defined
> > as the assignement to the WG. An ACE label is a label only containing
> 8bits
> > characters of the "Late Roman Character set" or of  "International"
> > character set as specified above. The WG has the mission to make possible
> > every stanard real names to be writen with that character set.
> >
> > >However, here's another attempt at providing some intuition:  All
> > >non-ASCII internationalized labels are intended to be displayed to
> > >users without change.  The same is true of most ASCII labels.  But
> > >there are some ASCII labels, called ACE labels, that are not intended
> > >to be displayed directly to users.  ACE labels begin with the ACE
> > >prefix and look like gibberish.
> >
> > (except if they come from an International character set string).
> >
> > >Every ACE label is equivalent to a
> > >non-ASCII label, which is what is intended to be displayed instead.
> > >For every non-ASCII internationalized label there is an equivalent ACE
> > >label.  [I'm being loose with the term "ASCII".  Unicode characters that
> > >are compatibly equivalent to ASCII characters (like those fullwidth
> > >characters) count as ASCII for the purposes of this paragraph.]
> > >
> > > > >   More rigorously, an ACE label is defined to be any label that the
> > > > >   ToUnicode operation would alter.
> > > >
> > > > So an ACE label is here defined negatively.  Feeling is that it means
> > > > "when ToUnicode will fail".
> > >
> > >I don't see what's negative about that definition.  I see no mention of
> > >failure.  Indeed, the second paragraph of the ToUnicode section (4.2)
> > >says "ToUnicode never fails."
> >
> > I do not mean in technical terms (if we can discuss all this it is because
> > the technical issues are settled). I mean in term of reading. When you say
> > A != B you do not really define A. unless you add more information.
> >
> > So when you read such a definition as "if (A != ToUnicode(A)) True(A)=1,
> > else True(A)=0;
> > you usually spend a few seconds trying to understand what it really means.
> >
> > > > When we mean that the ToUnicode (sucessfully) transform into an ACE
> > > > label.
> > >
> > >ToUnicode can never output an ACE label.  The first paragraph of the
> > >ToUnicode section says:
> > >
> > >     If the input sequence is a label in ACE form, then the result is
> > >     an equivalent internationalized label that is not in ACE form,
> > >     otherwise the original sequence is returned unaltered.
> > >
> > >Here is a math-notation version of the definition of ACE label:
> > >
> > >     { ACE labels } = { X : X != ToUnicode(X) }
> >
> > yeap. What I meant was: the part of the ACE Label after the ACE prefix.
> > This kind of confusion will be over and over during decades with that
> > wording. Saying there is an Unicode and an International writing and that
> > due to the possible confusion when the Unicode writing only uses
> > International characters, there is an International prefix, would probably
> > be quicker, easier and clearer to understand. I suppose every little
> French
> > or Japanese would not even understand there might be a problem.
> >
> > > > >    For every internationalized label that cannot be directly
> > > > >    represented in ASCII, there is an equivalent ACE label.
> > > >
> > > > My first reading was puzzling: "whenever the ACE process will not
> > > > work, there will be an pre-existing equivalent ACE label".  This
> > > > obviously does not make any sense.
> > >
> > >Again, I think the word "represented" is confusing you, although I
> > >thought the word "directly" would help.  Would it be less confusing if
> > >reworded the same way as before [again using the loose sense of ASCII]:
> > >
> > >     For every internationalized label containing non-ASCII characters,
> > >     there is an equivalent ACE label.
> >
> > The phrase is simpler but the wording is complex and needs a ToIDNA /
> > ToEnglish function.
> >
> > My wording would be:
> >
> > For every international writing there a standard writing.
> > There is not necessarily an international writing for every standard
> writing.
> >
> > >?
> > >
> > > > >    In IDNA, equivalence of labels is defined in terms of the ToASCII
> > > > >    operation, which constructs an ASCII form for a given label.
> > > >
> > > > this means ACE label.
> > >
> > >No, ToASCII(helloworld) == helloworld, which is not an ACE label.  The
> > >output of ToASCII is always an ASCII label, but not always an ACE label.
> > >The output is an ACE label only if the input was non-ASCII, or the input
> > >was an ACE label.  [Here again I'm using the loose sense of ASCII.]
> >
> > Here again you consider the result of the process. What matters is the
> reality.
> >
> > 1. is it standard writing (Unicode, 16 bits)
> > 2. is it international writing (International, 8 bits)
> > 3. or an Unicode writing using Interational characteirs only: the prefix
> > tells it..
> >
> > > > >   Labels are defined to be equivalent if and only if their ASCII
> > > > >   forms produced by ToASCII match using a case-insensitive ASCII
> > > > >   comparison.
> > > >
> > > > Then International names - ie non modified names by the ACE process
> > > > (now reduced to ToASCII only(?), should it not also be reversible and
> > > > the ToUNICODE results in the original scripting?) - cannot be compared
> > > > (I know it is wrong, but this is what I read here.
> > >
> > >That does not follow.  The definition of equivalence says:
> > >
> > >     For all X,Y  define  X ~ Y  to mean  ToASCII(X) ~ ToASCII(Y)
> > >
> > >You are wondering about "non modified names", that is, labels Z for
> > >which Z == ToASCII(Z).  That does not impede us from performing the
> > >comparison.  We compute ToASCII(X) (which returns X), and we compute
> > >ToASCII(Y) (which returns Y), and we compare the results using a
> > >case-insensitive ASCII comparison (which we can do because the output of
> > >ToASCII is an ASCII string).
> >
> > I agree. What I said is what I tend to read.
> >
> > > > >    Traditional ASCII labels
> > > >
> > > > What is "traditional".  Has it been defined?
> > >
> > >Just a regular English word, not a term.  "Traditional ASCII labels"
> > >means "the ASCII labels that we've all grown familiar with over the
> > >years".
> >
> > I am sorry, 90% of the mankind never grown familiar with any ASCII labels
> > over the years. 90% of the mankind executes a double IDNO process to
> relate
> > with you through a ToEnglish and a ToASCII operation.
> >
> > Anyway, you cannot use the word "traditional" without defining the RFC
> > where to find the table. When it comes to writing a program there is no C
> > traditional(char *) routine.
> >
> > > > >    already have a notion of equivalence: upper case and lower case
> > > > >    are considered equivalent.  The IDNA notion of equivalence is an
> > > > >    extension of the old notion.
> > > >
> > > > Old notion?  Is that the correct wording?  Is that not the DNS current
> > > > and stable notion?
> > >
> > >Yes, the current notion is old, having been with us for many years.
> > >Would "older notion" be less alarming?
> >
> > My objection was to the non definition of "old". For example there is a
> > much older Roman notion about J and I being the same. An old military and
> > Minitel user notion of a 34 character set where 0 and O and 1 and I are
> > used the same.
> >
> > The simpler is to define the character set (0-9, A-Z, "-","."). Maybe
> > adding "@" and "_" (?).
> >
> > > > > The Japanese phrase <sono><supiido><de> (pretend I wrote it using
> > > > > kana, which are non-ASCII characters) could be an internationalized
> > > > > label.  It is not an ACE label, because it cannot be represented in
> > > > > ASCII.
> > > >
> > > > Well I thought that ACE label resulted from the ACE process and
> > > > not the labels left identical by the ACE process (IMHO both ways:
> > > > "iesg---name.com" is perfect ASCII, but if ToUNICODEd it will have no
> > > > meaning).
> > >
> > >ToUnicode will not alter the label IESG---name (remember that ToUnicode
> > >takes a single label as input).  The output will be IESG---name.  In
> > >this particular case, the Punycode decoding step will fail.  But see
> > >below for a more interesting example, IESG--world.
> >
> > Again the difference between the label, the ACE label and the after the
> > prefix label. There should be a word for the last meaning.
> >
> > But you are true - as noted below the "iesg-world" is not an
> > internationalisable name. It should be mentioned in the RFC.
> >
> > > > > If you feed it to ToUnicode, it will not be altered, because the
> > > > > check for the ACE prefix will fail.
> > > >
> > > > If the registering process has prevented the creation of the "iesg--"
> > > > names.
> > >
> > >I was speaking of <sono><supiido><de>, which does not begin with the ACE
> > >prefix.  It's true that if someone registers IESG--world, it will be
> > >displayed by IDNA-conformant applications as three Chinese characters
> > >(U+53DF U+53E0 U+53D9), because IESG--world is an ACE label.  Maybe
> > >the user really wants a label that will display as "IESG--world", in
> > >which case the user will be dismayed at the behavior of IDNA-conformant
> > >applications.  Or maybe the user really intends to register the Chinese
> > >label, but used the ACE form in the registration process because the
> > >browser wouldn't let them enter Chinese characters, or because a clever
> > >clipboard decided to paste the ACE form, or because the registrar has
> > >not yet upgraded to support IDNA.  Those are all legitimate scenarios;
> > >people who compute the ACE form themselves and register it directly
> > >rather than relying on the registrar to compute it might in fact know
> > >what they're doing.  Registrars that support IDNA and receive ACE forms
> > >as input would probably do well to display them back in both ACE and
> > >non-ACE forms and ask for confirmation.
> >
> > You may also think about the thousands of key business names registered
> > that way at Verisign. Unless there is no provision made that such names
> > illegal - but on which ground the IETF is going to chose a 2 letters label
> > (which can be confused with an existing or future ccTLD) to make its use
> > illegal? This is why the whole story should be wrapped up asap with the
> > support of the Minc, the Ainc and Eurolinc, main language authorities and
> > we can proceed before UNESCO and ITU puts it at the WSIS agenda.
> >
> > >When several labels are equivalent, you can regester any one of them
> > >and you own them all.  If I register "example.com" or "Example.com"
> > >or "EXAMPLE.com", no one else can register any of them, they're all
> > >mine.  Similarly, no matter whether I register <sono><supiido><de> or
> > >IESG--d9juau41awczczp, no one else can register either one, they're both
> > >mine.
> >
> > That is the business fun!
> > and the International cybersquatting!!!
> > ask Verisign.
> >
> > > > The punnycode is no part of the document and should be introduced.
> > >
> > >It is mentioned at the end of section 1, and in the steps of ToASCII,
> > >and in steps ToUnicode.  Nameprep is mentioned in exactly the same
> > >places, and nowhere else.  You don't need to understand them to
> > >understand IDNA, you can think of ToASCII and ToUnicode as abstract
> > >operations.
> >
> > True but when you peruse a document and you find a word you ignore, you
> > look at the glossary. This is simple document presentation. But improves
> > readability a lot and avoid endless questions/disputes.
> >
> > > > > The label IESG--3ba is not an ACE label, even though it begins with
> > > > > the ACE prefix and the Punycode part is valid, because it is not
> > > > > equivalent to any non-ASCII label (because it is not nameprepped;
> > > >
> > > > is namepreparation part of the process. If yes it has to be included
> > > > in the ACE or ToASCII conditions above. If not this restriction
> > > > does not apply as such, it can only be noted. There may be a lot of
> > > > variations in the Unicode scripting the users/developers may want to
> > > > result in the same ACE_label.
> > > >
> > > > > it decodes to a capital A with grave accent).  If you feed it to
> > > > > ToUnicode, it will not be altered, because the comparison in step 7
> > > > > will fail.
> > > >
> > > > You understand that this is as long as DNS does not support &agrave;
> > > > but that other applications may. The definition given above talking
> > > > about "existing applications" - there are a lot of existing
> > > > application supporting it, and at any given time in the future there
> > > > will be more.
> > > >
> > > > So it means that we also want to define the ACE character set.
> > >
> > >I don't understand what your concern is here.
> > >
> > >IESG--3ba is not an ACE label because it is not equivalent to
> > >any non-ASCII label.  We have defined equivalence.  IESG--3ba is
> > >equivalent to X if and only if ToASCII(IESG--3ba) matches ToASCII(X).
> > >ToASCII(IESG--3ba) is IESG--3ba, because ToASCII does not alter ASCII
> > >labels.  So we're looking for a non-ASCII label X such that ToASCII(X)
> > >is IESG--3ba.  There is no such X.  If you want to understand why there
> > >is no such X, you'll need to examine the internal details of ToASCII,
> > >and then you'll discover that ToASCII performs Nameprep, and Nameprep
> > >can never output a capital A with grave (and to understand why that is
> > >so, you'll need to examine the internals of Nameprep, and then you'll
> > >discover that Nameprep performs case-folding).
> >
> > But you will have to explain many French people why after three years work
> > and the ability to support Chinese, their standard typing cannot be used
> > and to some why they could not register their name. The Marquis d'? will
> be
> > upset at that. Many keyboards supports accents separately from the letter
> > and can put a grave at any character as required by the proper scripting
> of
> > the language.
> >
> > >By the way, even though there is no ACE label that decodes to a capital
> > >A with grave, that doesn't mean ToASCII can't accept a capital A with
> > >grave as input.  It can, and Nameprep will map it to small a with
> > >grave.
> >
> > This is a wrong preparation process.
> >
> > >The output of ToASCII will be IESG--0ca, which is an ACE label,
> > >which ToUnicode will transform into small a with grave.  Notice that
> > >ToUnicode(ToASCII(X)) is not always X, it is Nameprep(X) (which is
> > >equivalent to X).
> >
> > Absolutely true. This is why the ACE process cannot be defined in relation
> > to the ToASCII/ToUnicode functions only.
> >
> > And here is the entire problem: toASCII/ToUnicode in the case of the DNS
> > should be DNS.2 integrated tools at inter-application level. While
> nameprep
> > is an extended service above the DNS layer (DNS+). The reason why is that
> > nameprep may include many more functions related to input/output massaging
> > and can be customized (like accepting abbreviated entries) at application
> > level.
> >
> > >Another way to see that IESG--3ba is not an ACE label is to observe
> > >that ToUnicode does not alter it.  If you want to understand why
> > >ToUnicode does not alter it, you'll need to examine the internal details
> > >of ToUnicode, and then you'll discover that ToUnicode decodes it to
> > >capital A with grave, then applies ToASCII, which produces IESG--0ca (as
> > >mentioned above), and then ToUnicode notices that IESG--0ca does not
> > >match IESG--3ba, and so ToUnicode returns the original input.
> >
> > True, but this is the same as my acceptation of "90877" as the US
> > abbreviation for the X.121 address of Dialog instead of requiring
> > "310690877" where "310" is the USA international prefix and 3106 the
> Tymnet
> > DNIC. Is the ACE process global or not (including only ToUnicode/ToASCII)?
> >
> > National consistency makes me feel that it should be semi-global ie
> > including customizable nameprep.  a) If you look at you word; you will see
> > that you different blend of English, French. b) we will probably support
> > national root views soon enough and national DNS options will probably
> > develop. They might also be supported there.
> >
> > IMHO the whole problem is to make sure that the wording is clear enough to
> > protect the toASCII and ToUnicode processes as usunversal and permanent
> > (otherwise we cannot industialize and export). To be sure they can be
> > adapted at uppser level to respect the real life needs (like grave capital
> > A?). And to support extended functions in a way not endangering the
> > stability of the process and the stability of the applications
> > (abreviatinos, synonisms, etc..)
> >
> > Also, let remember this is no temporary patch. This is way of writing real
> > words in using a minimum character set. That character set and ToAscii and
> > ToUnicode functions will most probably be engraved in silicon chips and
> > used worldwide in many applications from Keyboards to telephone etc. as
> > providing a larger number of alternative than digital (0-1) and numeric
> > (0-9) encoding. We must make sure we protect that universal industrial
> > solution, but do not limit our capacity for innovation.
> >
> > Thank you for your time/
> > jfc
> >
>
Follow-Ups:
- Re: [idn] IDNA text presentation (was Document Status?)
  - From: John C Klensin <klensin@jck.com>
- Re: [idn] IDNA text presentation (was Document Status?)
  - From: "James Seng" <jseng@pobox.org.sg>
References:
- Re: [idn] Document Status?
  - From: "JFC (Jefsey) Morfin" <jefsey@jefsey.com>
- Re: [idn] Document Status?
  - From: "JFC (Jefsey) Morfin" <jefsey@jefsey.com>
- [idn] IDNA text presentation (was Document Status?)
  - From: "JFC (Jefsey) Morfin" <jefsey@jefsey.com>
- Re: [idn] IDNA text presentation (was Document Status?)
  - From: "James Seng" <jseng@pobox.org.sg>
Prev by Date: Re: [idn] Re: Document Status?
Next by Date: Re: [idn] Document Status?
Previous by thread: Re: [idn] IDNA text presentation (was Document Status?)
Next by thread: Re: [idn] IDNA text presentation (was Document Status?)
Index(es):
- Date
- Thread