[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] URL encoding in html page



"J. William Semich" <bill@mail.nic.nu> wrote:

> So the domain name alingsås.com, which was registered by Register.com
> for one of its customers, and which currently resolves as the ACE
> domain name bq--abqwy2lom5z6k4y.mltbd.com to the Register.com
> "parked" page, will also resolve to the same parked page if the URL
> http://alingsås.com.nu/ is used, whether that URL is encoded as UTF-8
> UNICODE or ISO-8859-1.

Depending on your resolver.  On my Debian GNU/Linux system, "dig" can
successfully look up alingsås.com.nu, and so can "host" (which warns
that the name is illegal but prints the IP address anyway).  But the
resolver won't have it:

> ping alingsås.com.nu
ping: unknown host alingsås.com.nu

I tried adding "options no-check-names" to /etc/resolv.conf, but that
had no effect.  I then tried changing it to "options no-check-names
debug", but I got no debugging messages.  I don't know what's going on
with the resolver.  I'm using glibc 2.2.5.

By the way, when I tried the URL with various browsers, netscape 4.x and
mozilla passed the host name straight through and ran into the resolver
problem, but w3m followed the unfortunate recommendation in the HTML
spec and converted the URL to http://alings%E5s.com.nu/, and of course
that lookup failed.

> These also all will work for hypertext links on a Web page

That depends on how the browser handles hrefs containing non-ASCII.
This is invalid HTML, so there's no telling what will happen.  The HTML
spec recommends that the browser convert the non-ASCII bytes to %HH
escapes, but I think that makes the host name lookup even more likely to
fail (unless the browser unescapes the host name just before doing the
lookup).

AMC