[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Where to do form-folding?



However, you can write *very* fast code that does a very quick check to see if a
string is correctly form-folded (normalized, restricted repertoire). It amounts to
looking at each character, seeing if it is in the right repertoire, and seeing if
it's canonical class is ok -- something like:

int lastC = 0;
for (int i = 0; i < len; ++i) {
 int c = getClass(s[i]); // returns canonical class, or -1 if not permitted
 if (c < 0 || (lastC > c && c != 0)) return ERROR;
}
return SUCCESS;

The getClass() function can use a trie lookup, so it is only a few instructions (and
can be inlined). So the overhead for checking well-formedness is pretty small.

Mark

"A. Vine" wrote:

> All,
> To add to Dan's comments:
> Much normalization is expected to take place at the point of generation from the
> charset perspective.  In other words, the client is expected to normalize the
> text as it is created.  This is the case for much of the text processing world.
> Logistically it has been found that if it has not been done at the point of
> entry, then the data will be inspected for normalization many, many times in its
> lifetime, reducing efficiency.
>
> So, since standards are stating, or at least highly recommending, normalization
> at the point of creation, it's not a huge step to add canonicalization for
> IDNs.  Of course, the specification would have to be precise.
>
> It's true it is a bigger burden on the client.  But as Dan pointed out, it has
> some distinct advantages.  And it has a precedent.
>
> Andrea
> --
> Andrea Vine, avine@eng.sun.com, iPlanet i18n architect
> "In these Regulations any reference to a regulation is a
> reference to a regulation of these Regulations"
> -- Education (UK Student Loans) Regulations 1997
>
> Dan Oscarsson wrote:
> >
> > Hi
> >
> > Even if it is summer, some of you are hopefully still active. At
> > least some of you going to the IETF meeting.
> >
> > Here is one thing we could discuss and that could be discussed
> > at the meeting.
> >
> > The question is: Where should form-folding be done?
> >
> ...
> > There are many applications that compare hosts or domain names.
> > For example: sendmail and many browsers.
> > These applications should also compare IDNs as equal in the same way
> > that DNS does. This means that they also need to implement form-folding.
> > If the resolver libraries include the standard implementation
> > of form-folding (and name comparing), and had the API public, all
> > applications could use it instead of implementing their own.
> >
> > So by choosing 2) we can get the additional benefit of supplying
> > a standard place for applications to the routines to do
> > form-folding and IDN comparing. And thus reducing the risk of many
> > implementations, some that will do it wrongly.
> >
> > Thats it. What do you think? Am I missing some important aspects to
> > make the best choice? Maybe it is something that could be
> > discussed at the meeting.
> >
> > Regards,
> >
> >   Dan