[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] URL encoding in html page



> >> So now IDN is a larger scope than we expected, not just browser
> >> software needs to upgrade, even html editor like Dreamweaver, etc
> >> needs to upgrade...
> >
> >It should not be surprising that any software that deals with domain
> >names as such (as opposed to text in general) might need to be upgraded
> >in order to allow characters in domain names that used to be forbidden.
>
> It is the IETF hacker talking again. The common man do not think
> non-ASCII
> letters are forbidden.
So I think my Chinese name, or some Japanese, or Korean or any ones' name
that is non-ASCII should be forbidden in the internet world too : >

> In general you can all expect when soon non-ASCII domain names are
> officially
> allowed, there will be lots of places where people will enter names
> using non-ASCII. They will expect them to be allowed in HTML documents
> in
> href field and in img src field. It does not matter that some have
> decided
> to restrict the definition of "URL" to ASCII, people will and already
> do,
> expect non-ASCII to be allowed in both host and path part of URL, in
> native
> character set without ACE or %-encoding!! Especially as it works in many
> places
> already today.
That's right ACE or % escaped is like making displayable 16bits unicode or
8bits UTF8 falling back to 7bits ASCII, and on the UI of most of the
applications or OS are moving towards the support of Unicode already.
So I think that URL should be displayable and should move towards non-ASCII
and should not move back to 7bits ASCII(actually 5bits ACE)... kind of
stepping backwards to make new things happen : )

> It is important that existing RFCs like the one for the URL, imediately
> be
> updated to allow non-ASCII letters. And do not use the IETF hacker
> language
> and call it IRI, for the common man it will be a URL and URI.
> People will not ACE encode host names in URLs, people will not
> %-encode paths in URLs. They will do like they do today: use native
> character
> set of the HTML document. Ignoring reality does not work.
I think so too... what do they mean by the RFC only allow URL to have
non-ASCII letters, RFC is written by "HUMANS" and we can update them anytime
if we "WANT" to... think about the creation of IPv6 and all the RFCs that
requires to update when back then there is only IPv4... so should we say
that IPv6 format cannot be used and should somehow come up with an
"Algorithm" to convert the IPv6 into something that looks like IPv4.... : >