[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Last call comments to nameprep/stringprep: BIDI



Hello Paul,

At 15:29 02/02/11 -0800, Paul Hoffman / IMC wrote:
>I am replying to all minus the IDN mailing list. I have added Patrik to 
>the Cc list.

I have added the IDN mailing list back in. Please keep it.
I have also included the other relevant mailing lists in
the cc, to bring things to a quick conclusion.


>At 8:45 PM +0900 2/11/02, Martin Duerst wrote:
>>Currently, neither draft-ietf-idn-nameprep-07.txt nor
>>draft-hoffman-stringprep-00.txt deal with bidirectionality
>>(mixing right-to-left (Arabic/Hebrew) and left-to-right
>>writing directions) issues. This should be changed as soon
>>as possible.
>
>You didn't say why this should be one as part of nameprep or stringprep. 
>What about bidi makes the issue a preparation issue, particularly for 
>prohibition?

See below.


>>If a label can contain both right-to-left and left-to-right
>>characters, how it will be displayed, and how displayed
>>labels will be entered and looked up in the DNS, is highly
>>context-dependent. This is obviously very undesirable.
>
>This is a display issue; it has absolutely nothing to do with how names 
>are entered or looked up in the DNS.

It is not a display issue, it is an issue of conversion
between display and backing store.

Assume there were two labels inside the DNS, one reading ABCdef
and the other reading defABC, and both would be displayed
CBAdef. Who would consider that usable for the DNS?

With the current nameprep, this and similar situations will
happen if e.g. upper-case is in a right-to-left script, and
lower case in a left-to-right script.

Many similar issues have been discussed on this list. But please
note that it is not an issue of different characters in Unicode
looking the same or similar. It's exactly the same characters!

Also, please note that while for similar issues discussed on this
list, the problem was that there was not really a workable solution,
this is not the case here. Two main solutions have been proposed.


>>The following is a proposal written up by Mark Davis,
>>based on input from others:
>
>This has barely been discussed in the BIDI community; there has been 
>almost no review of it. Further, there was disagreement on it when Mark 
>presented it.

There is wide agreement among all the BIDI experts that have considered
this problem that a restriction of the combination of allowed characters
in each label is unavoidable independent of any other aspects of the
solution.

And you are right that there were two different main solutions,
and nobody was really sure which one to chose. But there was wide
agreement that a solution is needed.

The two main solutions differ in how they handle sequences of labels.

The solution described in draft-duerst-iri-bidi-00.txt tries to make
sure that the sequence of labels always goes the same way, left to right.
So something that is logically FTP.HEBREW.COM will show up as
PTF.WERBEH.MOC. In order for this to happen, special characters called
LRM have to be inserted around the '.'. It would be no problem check
for their presence and strip them in nameprep, but it is difficult
to have them inserted when typing a domain name in or when getting a
domain name back from a protocol.

The other solution, with which Mati Allouche has come up after
reading my draft in detail, is to give the sequences of labels
their 'natural' order. The above example would turn into
MOC.WERBEH.PTF. This may look very strange, but is actually
quite natural for native Arabic or Hebrew users, because that's
the way they read text. On various occasions that I have seen
examples of Arabic domain names, they were always displayed
that way.
It leads to a few strange effects, such as an inversion of
components of different nature in URIs
(example: http://ftp.HEBREW.COM/PATH/file.html turns into
http://ftp.HTAP/WERBEH.MOC/file.html), but these can be
read naturally as well. The main advantage is that it requires
less intervention/magic for input and for taking domain names
from a protocol and putting them into a textual context. This
is a big advantage.

Incidentally, it's also how at least some OS handles directory
paths. The only case I was able to confirm is (Japanese)
Windows 2000, where a folder ABCD (shows as DCBA) with a folder
EFGH (shows as HGFE) inside is shown as follows in the top
bar of the explorer:

D:\
D:\temp
D:\temp\DCBA
D:\temp\HGFE\DCBA

Looks really weird when you see it the first time, but bidirectional
writing comes with quite a few surprises.

Mark's proposal (just following) worked out the details of the
restrictions necessary for individual labels in order to work
with Mati's proposal. It's the best I know, and it's an
enormous improvement over just doing nothing at all and regretting
it later.


>>B. In any field that contains any RTL characters:
>>B0. no LTR characters can occur.
>>C1. a sequence of characters of type DIG can only occur at the end.
>>C2. a sequence of characters of type OTHER can occur only between 
>>characters of type RTL.
>>
>>I propose that this be added as an additional step after the current
>>'prohibition' step.
>
>Why would this be considered part of nameprep? That is, why are you 
>prohibiting the strings on the user side instead of on the server side?

As far as I understand, nameprep is for both sides. There is a single
difference, namely the treatment of unassigned codepoints, which go
through on the client side, but are checked on the server side (at
the time of registration). Except for this difference, both sides
are the same.

The reason for this, as far as I understand, is to make double sure
that no illegal things get into the DNS: If illegal names won't be
reachable from the client, there is no incentive for the registrars
to cheat.

Also, if this is handled as a 'policy' issue, different
registrations may choose to use different restriction policies
(if they become aware of the problem at all). This will
not work because it is then impossible to provide appropriate
support on the client side (browsers, mailers,...).


>This is the first time that the proposal has been seen by the WG.
>There is no Internet Draft

There is
http://www.ietf.org/internet-drafts/draft-duerst-iri-bidi-00.txt,
for the more general problem of bidi in URIs. The WG has been told
about it at http://www.imc.org/idn/mail-archive/msg03271.html.

And there is no need for a separate Internet Draft for DNS.
It should go into nameprep.


>and no examples of its use.

What kinds of examples are you looking for? More examples to explain
the issues? A list of examples that could be used for testing?
Examples for the draft itself? How many other examples are there
in the nameprep draft?


Regards,    Martin.