[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] last call reminder - Hebrew (and Arabic)



-----BEGIN PGP SIGNED MESSAGE-----

Jonathan Rosenne wrote:
> The proposals are inadequate for Hebrew domain names as follows:
> 
> Hebrew vowel signs etc. (points) should be stripped (mapped out) by
> nameprep. Unicodes 05B0 to 05B9, 05BB to 05BD, 05BF, 05C1, 05C2, 05C4.
> 
> Explanation: These marks in Hebrew are optional, one may or may not use
> them, for example to clarify a possible ambiguity. They should be
> allowed but they should be ignored. Their presence or absence should not
> make two host names different.

The following paper:

  Steven Atkin, Ryan Stansifer, Mohsen Alsharif,
  Bidirectional Domain Names,
  19th International Unicode Conference.

takes the position that the Arabic and Hebrew points should be
disallowed. I'm not familiar with Arabic or Hebrew, but from a
technical point of view there is nothing to exclude either solution,
and no additional difficulties from mapping them out. The question
is simply whether we want to allow them in encoded domain names, for
example URI links in HTML, email, etc.

I can't find the above paper on the web any more, so it is temporarily
at <http://www.users.zetnet.co.uk/hopwood/unicode/idnbidi.pdf>.
Here is the relevant section (I'm not necessarily agreeing with it;
just quoting it for discussion):

# There are a number of Arabic characters that can be safely
# excluded from domain names. Specifically, these include the
# Arabic presentation forms, U+FB50-U+FDFF and U+FE70-U+FEFC.
# It is safe to exclude these characters, as they only represent
# ligatures and glyph variants of the base nominal Arabic
# characters. Additionally, the Arabic points U+064B-U+0652,
# U+0653-U+0655, and U+0670 should also be excluded. In
# most cases the Arabic points are only used as pronunciation
# guides. If the points were to be included, then names that
# differed only in their use of points would be treated as if
# they were distinct and different names. This is like the
# English homograph "bow" (the arrow) and "bow" (the ship) which
# are ambiguous. Removing the Arabic points eliminates such
# problems, with the understanding that not every Arabic word
# would be able to be represented. The Koranic annotation signs
# U+06D6-U+06ED can also be eliminated from domain names, as
# they are not used to distinguish one name from another.
# In Hebrew the cantillation marks U+0591-U+05AF and Hebrew
# points UFB0-U5C4 can be excluded as they are predominately
# used as pronunciation guides and for indicating the underlying
# structure of text. Additionally, the Arabic and Hebrew
# punctuation characters are also excluded from domain names as
# they are currently not permitted. The list of acceptable
# Arabic and Hebrew characters are listed in Table2.
#
# Table 2: Acceptable Arabic and Hebrew characters
#
# Unicode Range         Script   Notes
# U+05D0-U+05F4         Hebrew   ISO8859-8
# U+0621-U+064A         Arabic   ISO8859-6
# U+0660-U+0669         Arabic   Arabic-Indic digits
# U+0671-U+06D3,U+06D5  Arabic   Extended Arabic letters
# U+06F0-U+06FE         Arabic   Persian, Urdu, and Sindhi


> Note: This implies that FB2A to FB4E and FB1D should be mapped to their
> base letters in nameprep.

With nameprep as it is now, yes. However, if NFC-normalisation was done
before mapping (there are other good reasons to do that), then it wouldn't
be necessary, because these characters are canonically equivalent to the
base character followed by the point character. Since they also have a
composition exclusion, they will never occur in NFC (or NFKC).

- -- 
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip


-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBPGdENTkCAxeYt5gVAQH8dwgAsuXM8mNz/pa5Wb93CKbzLW0ezYXaAiwI
DgP3TpZdFlArNvV2LS7ITkZ9/TMBluP8HikgLR4mmvpuoptzt142OksAIDh9K7n5
v+VwxdwqXGXS3Xoxl/4M9kUJC9qXeN/BwGla8J7rL+tIsuthyTuaRHnEQI2h92iX
pgCf3I7YCsePr9jY5lE5rVCaQjOhGWP5sRasaMKiFWagfEv+X0kxhQVuGFmv6YTZ
D5qmFqPDIXqdFZEOXmF/EI4GJM5f1qVDm76YFw2ivhuIN6BNnjUTG6z6DZ9UxKny
VQnc/zKXxBaWXR7REWScIeHklGL64q65YlLWoLBaprb8Fl3PCcb8+g==
=CRxe
-----END PGP SIGNATURE-----