[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Requirement [12] and charset tagging



> >[12] This document does not recommend any charset for IDN. If more than
> >one charset is used, or might be used in future, in the protocol, then
> >the protocol must specify all the charsets being used and for what
> >purpose. It must also conform to [RFC1766] by tagging the charset. No
> >implicit rules should be allowed for multiple charsets. A CCS(s) chosen
> >must at least cover the range of characters as currently defined (and as
> >being added) by ISO 10646/Unicode.

> The sentence "It must also conform to [RFC1766] by tagging the
> charset." is overkill for a protocol that only allows one charset.

I think this is basically a typo -- what was intended to be a subordinate
clause has become a separate sentence, confusing its meaning.

But the more serious problem is that RFC 1766 specifies language tagging,
not charset tagging. So this also confuses two different requirements.

Specifically, I see two separate things here:

(1) If more than one charset is allowed there has to be a way to label
    the charset being used. Charset naming and registration is spelled
    out in RFC 2278.

(2) One can argue that language labelling may also be necessary in some
    cases. I would argue that if this is done it should be in-band. And
    there definitely needs to be some discussion as to whether or not it
    really is a requirement. The corresponding language tag naming and
    registration specification is RFC 1766.

> For example, if the protocol only allows UTF-8, why should every name
> part have to be marked as UTF-8? I propose changing that sentence to
> "If multiple charsets are allowed, each item must be tagged with the
> name of the charset".

This is fine as far as it goes, but there's also the language/charset
tag confusion to contend with.

				Ned