[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] My draft for internationalisation of DNS



Title: RE: [idn] My draft for internationalisation of DNS


   Since we have to have some kind of normalisation at some
   point(s) anyway, I still suggest using normalisation form KC,
   augmented appropriately.

   Why KC instead of C?  a) Because we plan to have case insensitivity
   (in some way) and compatibility distinctions are, if not less
   important than case, at least normally not more important.
   b) KC eliminates typographic disctinctions like 'fullwidth',
   'narrow, 'italic', that occur for some characters. c) It also
   eliminates other presentation form distinctions like typographic
   ligatures (but not ortographic ligatures) and initial/medial/final
   distinctions for presentation form Arabic. Yes, these could all be
   made disallowed, but I see no major reason to disallow them.

   Present day domain names have very much an 'identifier' flavour.
   And 'identifiers' in programming languages are usually "word-like",
   i.e. one tries to cover what could be words in a natural language,
   as written down in various ortographies, and usually exclude
   symbols and punctuation (with a few listed exceptions).  In my
   list of appropriate/inappropriate characters for IDNs I tried
   to maintain that 'identifier' flavour.

   About the case insensitivity, I'm unsure about the UTR 21 approach.
   It may be better to just combine KC with downcasing (only), not
   with uppercasing and then downcasing (as UTR 21 nearly does).
   Otherwise undesirable things may happen with i/dot-less i,
   german sharp s, and iota subscripts (in case someone wishes
   to register classical greek domain names), as well as a few
   other characters which may come out in non-KC (and non-C)
   form.  How to deal with (non-Arabic) final forms is also
   something we need to decide upon (keep or normalise away).
   And finally, ZWJ, ZWNJ, and a few more (NOT all) Cf character:
   Allow/disallow? If allowed: significant or not?

   I do NOT think one should normalise away the distinction between
   LATIN CAPITAL LETTER A and CYRILLIC CAPITAL LETTER A, for instance,
   but instead have some INFORMAL registration rule (for human
   judgement).  But normalising away the distinction between
   HYPHEN-MINUS and HYPHEN (e.g.) would be reasonable.

                Kind regards
                /kent k


> -----Original Message-----
> From: Martin J. Duerst [mailto:duerst@w3.org]
...
> At 15:47 00/02/08 +0100, Dan Oscarsson wrote:
>
> > Ok. The document on the differences between form C and KC was
> > not that easy to read. I thought the idea was to remove the
> look a like
> > glyphs, among other things, but that is apparently wrong.
> >
> > So what do you think? Is it better to user form C, and remove
> > difficulties by excluding them from the repertoire?
>
> For the compatibility area, covered by KC, it's a one-by-one
> work. Some things are easy to eliminate by just forbidding some
> codepoints, others (smilies,...) might not be worth worrying
> one way or the other (although of course in the end we have
> to decide), others may need some detail work, e.g. a reference
> to KC (but I don't know any specific examples for these yet).
>
>
> > Kent Karlsson wrote one draft about what ought to be allowed.
> > Maybe the combined knowledge of you Martin and Kent could help
> > me (all of us) to define what is suitable to be used in domain names?
>
> Well, yes. One thing I think we could discuss now is to what
> extent we want to allow all kinds of symbols (e.g. Zodiac,
> Smilies, and so on). This is an area where everybody can
> contribute, I guess. For those who don't have a Unicode book
> handy, see e.g.
> http://charts.unicode.org/Unicode.charts/normal/U2600.html.
>
>
> Regards,   Martin.
>
>
> #-#-#  Martin J. Du"rst, World Wide Web Consortium
> #-#-#  mailto:duerst@w3.org   http://www.w3.org
>