[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: My prod at IDN requirements



At 16:15 00/01/03 +0100, Harald Tveit Alvestrand wrote:
> Hooray for the news! Time to get to work!
> Here's a few thoughts about requirements.

A few picks on alternatives:

> More in the solution space:
> 
>    iso 10646 characters will be enough forever for DNS purposes YES/NO

Yes.

>    a single representation for i18c must be chosen YES/NO

Is this meant in terms of character encoding? I would very much
hope that only one character encoding is used, but for the
req doc, I think it is possible to go one step back:

- In all cases, it must be clear which characters are used,
  and not only which bytes.
  (the currently known solutions to this are:
   - A single character encoding must be choosen
   - Character encoding must be identified separately (e.g. tags,...)
  )
  

> For matching records, Choose One:
> 
>    it matters whether matching is consistent across all servers

Very clearly important. Matching must be consistent across
all servers.

>    i18c Cyrillic A must compare not equal to Latin A

Follows from consistent server behaviour and the fact that
we don't want to require it to compare equal.
May look like a serious practical problem, but won't, because
DNS label components are words, not letters.


>    i18c A with Ring Above must compare equal to a with ring above
>    i18c A with Ring Above must compare not equal to a with ring above

'must compare' is not clear enough. Is this for the user, or for
a DNS server? I would say that for the user, it indeed must, but
that that should be worded carefully so that it doesn't imply
that it does on the server. Also, mention that case behaviour
should be user-dependent (Turkish dotless i).


>    i18c ASCII A must compare equal to a

It already does :-(.


>    i18c A + COMBINING RING ABOVE must compare equal to A with Ring Above
>    i18c A + COMBINING RING ABOVE must not compare equal to A with Ring Above

>From the user perspective, there should be no difference anyway.
On the server side, I wouldn't want to have to do all the work.
The solution to this is probably early normalization, see e.g.
http://www.w3.org/TR/WD-charreq#3 and http://www.unicode.org/unicode/reports/tr15/.

[By the way, the editors of the req doc should feel free to use
whatever material they feel could be useful from http://www.w3.org/TR/WD-charreq,
and I would like to inform this group that I volunteered to Patrik Faltstrom
to write up an update of the highly outdated draft-duerst-i18n-norm-00.txt
to try and summarize the relevant parts of Unicode TR 15 (Normalization
Form C). I hope to get around to this soon.]


> Others are MUCH better than me in compiling example cases and requirements 
> for Korean, Japanese, Thai, Arabic, Hebrew.....

A lot of that is covered by the work on normalization. What that
work does not cover is variants of ideographic characters and
compatibility variants (such as an 'a' and an 'a' with a circle around it).

For ideographic variants, my first shot requirements would be:

- A solution has to define how/where ideographic variants are dealt with.
- A solution has to take into account that in general, there is no
  single way to collapse ideographic variants (i.e. many simplified
  Chinese characterss have more than one traditional equivalent,...).


Regards,   Martin.


#-#-#  Martin J. Du"rst, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org