[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] URL encoding in html page



Bruce Thomson <bthomson@fm-net.ne.jp> wrote:

> Well the HTTP spec doesn't forbid the characters used in ACE, to be
> sure.  But you are tunnelling a new form of information through the
> legacy protocol, right?

Sort of.  The ACE label has always been a valid host label.  The ACE
label itself is not a new form of information.  What's new is that this
traditional ASCII host label is now considered to be equivalent to a
non-ASCII label.  But even if you don't know that, you can still treat
it as a traditional ASCII host label (which it is), and everything will
work, except that humans looking at it won't find it meaningful.

> As an implementor of a web server, how do I know how to interpret the
> "host:" field in the HTTP header?  If I see something that looks like
> ACE, I suppose I would need to decode it to the original UTF-8 to
> compare with my virtual domain definitions.

No, you wouldn't.  If you don't know anything about IDNs, then in your
mind all host names are ASCII, so you would do an ASCII comparison
between the Host: field (which is ASCII) and your virtual domain
definitions (which are also ASCII), and that would work just fine.

On the other hand, if you know about IDNA, then your virtual domain
definitions might be non-ASCII, but you know that the proper way to
compare two domain names is to convert both to ASCII and then do an
ASCII comparison.  (That's IDNA rule 3.)

In neither case do you need to decode the ACE.  The only time you'd have
a reason to do that is when you're trying to hide the ACE from humans.

AMC