[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] URL encoding in html page



> UTF8 was there for the first place is to make the transition from 8 bits
to
> 16/32 bits Unicode smoother, so the existing 8 bits system will be able to
> use it. Same as base64 is to downgrade the 8bits into ASCII to make the
> transition smoother...

Agree. Spend more time to think of this and why arguments on ACE vs UTF-8 is
really irrelevant except religious prejudice.

> So should we say that a good design should have fallback(downgrade) for
> TRANSITIONAL period only. Then a good IDN design should be able to use ACE
> as the fallback for old system, BUT SHOULD also be able to step forward to
> use 8/16/32 bits or even more bits when needed.

Maybe.

> So we should not say that ACE should be a long term solution for IDN, it
> should ONLY be a TRANSITION solution that allows  the LONG TERM solution
of
> using UTF8 or 16/32 bits Unicode to work.

The best way is to write a draft, explain how you wish to transit from the
IDNA to a long term solution (The closest I seen is UDNS). Then gather the
rough consensus so we can move it forward.

IMHO, the problem of moving away from ACE to a long term solution is a
smaller one compared to the problem of what is the long term solution.

> More bits is good, but when we plan for things more than we need, then it
> should be considered to be a waste of resource.

Read: IPv6.

> So why do we need 128bits
> now(i dont think the combined total of characters in all languages in the
> world would require that much, not unless we want to include scripts from
> oter planets : > ), whereas we need 8/16/32 bits for Unicode, so why not
> design a system able to accept ACE as a fallback and also 8/16/32 bits? If

How many new han ideograph are created every day? How many ancient scripts
not encoded yet (e.g. Linear B)?

Or how about a new 128bit character table whereby allocation can be
allocated to 2^64 different locale, each locale get 64bit of space to put
their own encoding? (I know, sound much like IPv6).

Are you in the position to predict the future?

My best *guess* is ISO10646 will get slowly get adoption over the next 20
years. I *guess* UTF-8 will get more popular in the next 10 years. I *guess*
compressed bitstream (i.e. 1bit) ISO10646 will see some more adoption in
certain place where space is a constraint. I also *guess* there will be some
initative to do locale-based table (vs script based) but they will be faced
with huge barrier. But I could only guess...

> you can justify why designing a system that can handle ASCII as a fallback
> and can automatically support 8/16/32 bits Unicode is not a good design,
> then I think my thinking is wrong.

The question is really why 8/16/32 bit Unicode is better than 5bit (ACE)?

8 is not neccessary better than 5 just like 32 is not neccessary better than
8. It is all engineering trade off.

So what is the advantage of moving towards more bits? And how much more is
enough?

ps: I am not agreeing or disagreeing with a transition. I reserved my
judgement later.

-James Seng