[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] Preparation of Internationalized Host Names - Arabic



I did not mention Arabic vowels and Shadda because I don't feel qualified
to.

Jony

> -----Original Message-----
> From: Paul Hoffman / IMC [mailto:phoffman@imc.org]
> Sent: Saturday, July 08, 2000 8:06 PM
> To: Jonathan Rosenne; idn@ops.ietf.org
> Subject: RE: [idn] Preparation of Internationalized Host Names - Hebrew
>
>
> At 12:43 PM +0300 7/8/00, Jonathan Rosenne wrote:
> >  > Please note that not all punctuation is prohibited. The rules for the
> >>  specific kinds of punctuation that is prohibited are in the document.
> >  > U+05C0, which looks just like the ASCII "vertical bar", is probably
> >>  acceptable (since vertical bar is acceptable). U+05C3 looks just like
> >>  a colon and is therefore not acceptable; thanks for pointing this
> >>  out. (And I have noted it to the Unicode folks for when they update
> >>  the standard).
> >
> >Its meaning is punctuation, like comma or full stop, never mind
> its shape.
>
> Exactly my point. At present, we do *not* prohibit all punctuation.
> The only prohibited punctuation are characters are that are reserved
> or delimiters in URLs [RFC2396] and [RFC2732]. If this group decides
> to prohibit all punctuation, certainly we would then prohibit U+05C0.
> Or, we might prohibit all punctuation other than a certain small
> group of characters (which would be pretty difficult to choose
> correctly...). But, for now, we only prohibit a small set.
>
> >  > >2. Cantillation Marks
> >  > >0591 to 05af
> >  > >
> >  > >These should be either prohibited or ignored since they do
> not affect
> >>  >pronunciation, similar to ignoring case differences.
> >>  >
> >>  >Personally, I would rather prohibit them since their presence is
> >>  most likely
> >>  >to be an error.
> >>
> >>  If they never appear in personal names, company names, or spoken
> >>  phrases, then they can safely be prohibited. Is that true for all of
> >>  them?
> >
> >They never appear in common use, they are only used in biblical texts.
>
> Thanks, that's what I wanted to hear. I'll prohibit them in the
> next draft.
>
> >  > >2. Points
> >>  >05b0 to 05c4
> >>  >
> >>  >These should be either prohibited or ignored since they are
> optional. In
> >>  >modern Hebrew they are seldom used, not all systems support
> >>  them, and it is
> >>  >valid to omit them.
> >>  >
> >>  >Personally, I would rather ignore them because a user may enter
> >>  them and why
> >>  >not let him.
> >>
> >>  This is much more problematic. We do not currently have any "ignored"
> >>  characters. If I understand this correctly, the host name <HEBREW
> >>  LETTER HE><HEBREW POINT SEGOL>.com looks and sounds different than
> >>  <HEBREW LETTER HE><HEBREW POINT TSERE>.com, but could be considered
> >>  the same for a host name. If so, I think we would have to prohibit
> >>  them, not ignore them. Does that sound correct?
> >
> >They do sound different, but do not necessarily look different
> because it is
> >not mandatory to display points.
> >
> >Just like you ignore case in English, in Hebrew you should ignore points.
>
>  From my (very limited) understanding of Hebrew, this makes sense.
> However, it means that we will have to make such other "ignoring"
> rules for a variety of scripts. I'm happy to do that if the group
> wants, but it certainly makes the name preparation harder. (Just to
> be clear: my personal preference would have been not to ignore case,
> but that decision was made *long* ago and cannot be reversed.) Doing
> so would require an extra step, probably between checking for
> prohibited characters and folding case, that says "look for any
> characters on this list and throw it away".
>
> How does the group feel about this? What other characters in scripts
> other than Hebrew would go here?
>
> --Paul Hoffman, Director
> --Internet Mail Consortium