[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IDN WG Last Call on two major changes to Stringprep



Edmon Chung <edmon@neteka.com> wrote:

> -----
> 2) If a string contains any Right-to-Left character (defined as
> belonging to Unicode bidirectional categories "R" and "AL"), the string
> MUST NOT contain any Left-to-Right character (defined as belonging to
> Unicode bidirectional category "L").
> 
> 3)  If a string contains any Right-to-Left character (as defined above),
> a Right-to-Left character MUST be the first character of the string, and
> a Right-to-Left character MUST be the last character of the string.
> -----
>
> I dont quite understand why we need to have 3.
> Isnt 3 a subset of 2?

No, because there are characters that are neither Right-to-Left nor
Left-to-Right.

> Also this will mean that there cannot be a mixture between RTL and LTR
> characters.

Correct.  That is exactly rule 2 above.  "If X appears then Y must not
appear" is exactly equivalent to "X and Y must not both appear".

> While I am not familiar with Arabic, I sure have seen English words
> mixed with Arabic in phrases, albeit rare.

I don't doubt that.  I know almost nothing about the bidi algorithm, but
the bidi experts concluded that this was the price that needed to be
paid to prevent distinct labels from being displayed identically.

> I didn't see much discussion on the list before on bidi issues, but I
> did see an example used:
>
> > Assume there were two labels inside the DNS, one reading ABCdef and
> > the other reading defABC, and both would be displayed CBAdef. Who
> > would consider that usable for the DNS?
>
> Why would both be displayed the same?

I don't know. :) The answer can presumably be inferred from the bidi
tech report.

> a given "string" can be a "part" of a label, so there could be two
> "strings", one containing LTR one RTL in the same label.

Nameprep doesn't need to know whether it's input string is a label or
part of a label or whatever.  Nameprep could operate on any string.

But in IDNA, the IDNA spec specifies that nameprep is applied to labels,
not to substrings of labels.  Therefore a label cannot contain both LTR
and RTL characters.

> Please clarify two simple things:
> a. Are mixed RTL and LTR characters allowed within a label?

No.

> b. If there are more categories than R, AL and L that we are discussing,
> then in point 3 it should not say "As defined above":

Yes it should.  Rule 3 is using the same definition of Right-to-Left
character as stated in rule 2: 'belonging to Unicode bidirectional
categories "R" and "AL"'.

AMC