[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] nameprep forbidden characters



I just looked at the nameprep draft for the first time.  The most
striking feature (to me) is that, compared to existing host names, it
takes a very different approach to specifying the allowable characters.

For existing host names, the allowable categories are listed: letters,
digits, hyphen.  Everything else is forbidden.  The nameprep draft, on
the other hand, lists the forbidden characters (sometimes as categories,
sometimes as enumerations), and everything else is allowed, except
characters not yet assigned, which are forbidden.

Isn't the explicit-allow approach safer?  If it is later decided that
more characters should be allowed, they can be, but once the cat's out
of the bag, you can't put it back in.  (Example:  The first letter of a
host name label was originally required to be a letter, but later it was
allowed to be a digit.)

Many real English names use spaces, exclamation points, parentheses,
periods, commas, etc, but we've been surviving just fine without them in
host names.  In fact, having a restricted repertoire makes host names
easier to remember, to guess, and to type.

Is there some small list of UniData general categories that would be
safe and allow a degree of flexibility in all languages analogous to
what we have now for English?

Example:  Japanese song titles often use wavy dashes, and sometimes use
straight dashes.  Katakana words are sometimes separated by KATAKANA
MIDDLE DOT (U+30FB), but often the dots are omitted.  Perhaps host names
should avoid all punctuation in all languages so people don't have to
worry about it.

AMC