[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: An argument against multiple character sets
Hello Bill,
I just tried with Opera 3.61, but it didn't work. Did I miss some setting,
or what?
Regards, Martin.
At 16:44 00/01/23 -0500, J. William Semich wrote:
> Hello;
>
> I'm assuming you are using "character set" interchangeably with "encoding"
> below...
>
> At 12:01 PM 1/23/00 -0800, Paul Hoffman / IMC wrote:
> >There has been some discussion on this list about whether or not we should
> >allow domain names to be created in different character sets. I believe
> >that there is a simple argument that shows that we can't.
> >
> >Let's say I want to register a domain name that is two letters: LATIN SMALL
> >LETTER F followed by LATIN SMALL LETTER U WITH OGONEK. If I use ISO 8859-4,
> >that would encoded as 0x46F9. So far so good. You see a billboard with my
> >domain name on it, and you enter it into a browser. That browser uses a
> >different character set, let's say Unicode. The browser sends to the
> >resolver 0x00460173.
> >
> >There are two problems here:
> >- The browser *can't* know every possible character set
> >- Even if it did, it wouldn't know which one to use
>
> Exactly! <smile>
>
> That's why Microsoft has adopted UTF-8 (UNICODE) as its "standard" default
> configuration, both in IE5 and in Windows 2000 DNS. And once Netscape
> adopts the same default "standard", both browsers will only (or, primarily)
> send UTF-8 queries to the resolver. Our customers tell us other browsers
> (such as Opera) can also resolve our test UTF-8 test URLs
>
> How to best modify BIND in order for it to be able to deal with all this is
> probably much more important to this discussion than deciding which
> encoding should be set as the standard, IMO. UTF-8 looks pretty "standard"
> already, from the client/user point of view, at least.
>
> I'm not saying I think this "unofficial" working group should just bless
> UTF-8. I'm saying the more important work is in developing standards for
> upgrading BIND.
>
>
> -- Bill Semich
> .NU Domain
>
> >
> >Adding a charset tag to the internationalized string in the domain name
> >doesn't help. There is no way for someone seeing a printed representation
> >of the internationalized string to know which character set was used; in
> >this case it could be 8859-4 or Unicode or possibly other character sets
> >that contain that character.
> >
> >Even requiring all resolvers to do the conversion doesn't help unless we
> >list all the possible character sets and never change the list. This
> >introduces many problems:
> >- New character sets can't be added later without simultaneously updating
> >all the resolvers on the Internet to use the added character sets. Such
> >simultaneous updates are impossible.
> >- The main reason we are considering more than one character set now is
> >current politics and desires for favored character sets. We can safely
> >assume that politics and desires will continue to change and evolve.
> >- We are forcing resolvers to do much more processing than they are now.
> >
> >In short, I don't see how a solution that allows more than one character
> >set, or even more than one encoding, will work. If others have
> >counter-examples, I'm open to hearing them.
> >
> >--Paul Hoffman, Director
> >--Internet Mail Consortium
> >
> >
> >
> Bill Semich
> President and Founder
> .NU Domain Ltd
> http://whats.nu
> bill@mail.nic.nu
>
>
>
#-#-# Martin J. Du"rst, World Wide Web Consortium
#-#-# mailto:duerst@w3.org http://www.w3.org