[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Requirements I-D





There are some significant problems with the locale approach.

1. It's more complicated
2. It requires that locale information always be sent (and that info is not
always available. Remember; a lot of URLs are just copied visually --
there's no indication of locale).
3. You get problems when the locale info is not sent, or is sent
incorrectly.
4. Data on the precise casing behavior is sometimes contentious (e.g. some
French strip uppercase accents, some don't. Some just strip on fully
uppercase words, and leave them on inital-cap words).
5. Data on the precise casing behavior is sometimes unknown (it may not be
clear what is the expected behavior in Turkmenistan).

Locale-independent casing, on the other hand, will produce reasonable
results except in a few cases, and even those cases are not so bad. The
only real issue is where a domain name is *only* distinguished by the
dotted/undotted versions of I. The system would simply not allow both
versions to be registered (assuming Unicode case folding). So I couldn't
register both

turkish_bath.com, and
turk*sh_bath.com

(where * represents a dotless i, and & a dotted capital I); I can only
register one of them. Since there is only one name registered, if a user
types any of:

turkish_bath.com
turk*sh_bath.com
Turkish_Bath.com
Turk&sh_Bath.com
TURKISH_BATH.COM
TURK&SH_BATH.COM

or any other case variation of these, s/he will get to the same place.
However, when I put "turk*sh_bath.com" on a billboard or in a banner ad, or
wherever, I can, of course, spell it correctly. When users type it in, they
go to the right place.

While this approach is not perfect, it is also not terrible:

* it shouldn't represent an enormous burden on users
* it is easy to explain precisely what the limitations are
* in practice shouldn't cause that many problems.

Mark

-------------------------------------

James Sing wrote:

Lets think about from the basic?

English sing.com [a] is not neccesary equivalent to Turkish sing.com [b]
(with
dotless i) so they are individually "unique" on their own and thus we have
no
problem here. However, the problem occurs when we have SING.COM [c] and
does
that match with [a] or [b]?

So, one way we can see this is to define not just [c] but [c-E] and [c-T].
And
that locale info can be transmitted in some manner, say by OS locale or by
encoding locale info.

In one example, if you key in [c] on English OS, it will fold into [a]. If
you
key in [c] on Turkish OS, it will fold into [b]. This folding could for
example occurs on the resolver or client side before it hits the wire. And
since the server only receive either [a] or [b] directly, it meets the
requirement [1] "resolve any domain name anywhere". But is this "unique"?
Well, by looking  [c-E] and [c-T], it is difficult to judge. But we can
identify it differently because of locale context and because of locale
context, it makes both unique.

On the other hand, this does not works well, for example, when you send
[c-T]
to someone else with an English OS and the resolver there will treat [c] as
[c-E] instead.

So another solution is probably to encode locale info into the name, e.g.
SI[T]NG.COM or SI[E]NG.COM. (or some forms..). This means that SI[T]NG.COM
and
SI[E]NG.COM are always uniquely different and thus there is no confusion.
And
it meets R[1] perfectly too. It is possible that the the client will strip
away the [E] and [T] and displaying only SING.COM. But that is an issue for
the client, not for this WG. (Another way to look at this is to presume we
have two different codepoint for I, one for English and one for Turkish).

Personally, I dont like either one of the solution but my point is that
there
*is* a solution. We should not give up without trying first. Maybe someone
can
come out with a better one?

But I totally agreed with Harald when he say "if the choice is between
having
most of the world irritated and nobody in the world having a working IDN
system, I'll vote for the working system".

-James Seng

Paul Hoffman / IMC wrote:
>
> At 10:38 PM +0800 5/16/00, James Seng wrote:
> >To be exact, I want to change it from locale independent to local
dependent.
> >Or at least, drop it to make it an open case and see how the proposal
> >protocols deal with it.
>
> There appears to be general agreement among the vocal people on the
> list that locale-independent is the only technically-possible
> solution we have seen. Could you could outline a locale-dependent
> solution that would retain uniqueness of names?
>
> --Paul Hoffman, Director
> --Internet Mail Consortium