[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character equivalence mapping (was: Re: [idn] SLC minutes)



In addition,

1. This issue was debated at length some time ago. I suggest that the people
arguing for visual confusability as a criterion for matching look at that
discussion in detail before proceding.

2. Moreover, stop and think about the implications; using both case folding
and visual confusability would have some very unpleasant consequences. For
example, it would force the ASCII letters N and V to be in the same
equivalence class:

- N is in the same class as GREEK NU, by visual confusability.
- GREEK NU is in the same class as greek nu, by casefolding.
- greek nu is in the same class as v, by visual confusability.
- v is in the same class as V, by visual casefolding.

Mark
—————

Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

----- Original Message -----
From: "Kenneth Whistler" <kenw@sybase.com>
To: <edmon@neteka.com>
Cc: <idn@ops.ietf.org>
Sent: Wednesday, January 02, 2002 17:51
Subject: Character equivalence mapping (was: Re: [idn] SLC minutes)


> Edmon suggested:
>
> > Character Equivalence mapping is to deal with this issue:
> >
> > A registrant registers a domain <ALPHA><BETA>.example
> > Advertises it to other people as their capital form AB.example
> > An end user will not know whether it was Greek or English and attempts
to
> > access the site with ab.example and does not get to it.
> >
> > With Character Equivalance mapping, this situation would not occur.  No
> > matter how a domain name is represented, it is always unique.
>
> I think this example nicely points up the contrary problem that
> cross-script mapping has. If you start doing cross-script equivalence
> mapping to eliminate differences between (to Latin-trained eyes)
confusable
> letters, you violate the integrity of other scripts and start mapping
> the set of possible strings in those scripts even more confusably into
> the already crowded domain namespace of Latin strings.
>
> In this particular example, suppose I was a Greek and actually wanted
> to register <ALPHA><BETA>.com, in addition to <ALPHA><BETA>.gr for
> the <ALPHA><BETA> construction company in Athens. Whoops! I'd be
> out of luck since ab.com already exists and is registered to
> Allen-Bradley. (See www.ab.com ) Why should I, as a Greek, find my
> own Greek namespace unpredictably polluted by some arbitrary list
> of equivalences between Greek letters and Latin letters?
>
> And exactly what equivalences would you suggest? Greek uppercase
> eta is basically indistinguishable in shape from a Latin uppercase "H".
> So do I equivalence map it to Latin "H", which would make no sense at
> all for transliteration and serve only the purposes of dumb equations
> for people who know nothing about Greek whatsoever? Or do I equivalence
map it
> to Latin "I", which is the normal transliteration for eta in Modern Greek?
> Or do I equivalence map it to Latin "E", which is the normal
transliteration
> for eta in Ancient Greek?
>
> So does: <ALPHA><BETA>.<OMICRON><MU><ETA><RHO><OMICRON><SIGMA>
>
> equate to: ab.omhpo<sigma> or ab.omiro<sigma> or ab.omero<sigma> or
>            ab.omhpos or ab.omiros or ab.omeros ?
>
> By the way, the 5th example is how the Greeks themselves would Latinize
> it. (see www.omiros.gr )
>
> The problem of "AB.example" is generally dealt with by context. First
> of all "example" would be in Greek if I was really dealing with Greek.
> Second, if I wanted people to enter "ab.whatever" I'd be advertising
> in *English* to set the expectations. If I wanted people to enter
> "<alpha><beta>.....", I'd be advertising in *Greek* to set the
expectations,
> and people would be using Greek keyboards and expect to enter Greek.
>
> Furthermore, visual confusability quickly runs off the road as the basis
> for determining equivalence classes when you start to deal with scripts
> that have more complicated rules for the presentation of glyphs than
> is typical for the Latin script. Which of several possible forms is
> the basis for the confusability used to determine the equivalence?
> And this turns into an N-body problem, because you start having to
> account for visual confusability between N different scripts -- not just
> between N scripts and Latin characters. Where do you draw the line, in
> principle? Or do we just end up arguing for the next decade about all
> the edge cases?
>
> >
> > Bear in mind that this need to happen only during matching of names
within
> > the DNS server.
> >
> > A registrant can register <ALPHA><b>.example all they want.  This is the
> > misconception that I wanted to point out.  Character Equivalence mapping
> > does not prohibit mixed scripts.
>
> But it does severe damage to the integrity of namespaces in other scripts.
>
> This is Latin- and English-centric thinking, in my opinion, that would
> damage the whole point of having IDN's by folding other scripts towards
> Latin characters.
>
> --Ken
>
> >
> > Edmon
> >
> >
>
>