[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Requirements I-D



> From: "Mark Davis/Cupertino/IBM" <mark.davis@us.ibm.com>
> Importance: Normal
> Subject: Re: [idn] Requirements I-D
> To: idn@ops.ietf.org
> 
> I was trying to figure out RACE compression, and wanted to make sure my
> understanding
> is correct. I also have some suggestions (I don't know how cast in stone
> RACE is...):
> 
> A. RACE
> 
> As far as I can tell, this is what happens in RACE compression as currently
> written:
> 
> Determine the set of all high octets (first of pairs).
> If that set is has more than 2 members, or if 00 is not in the set, the
> output is D8 + input.
> Call the largest element of the set U1.
> If U1 = D8..DC, return error.*
> If <00, 99> is in the input pairs, return an error.*
> //  U1 may be zero
> Otherwise the output is U1 then the following encoding of pairs:
> - <U1, FF> => FF 99
> - <U1, XX> => XX
> - <00, XX> => FF XX
> 
> * I'd suggest adding an explicit statements in the
> compression/decompression process to return errors.
> 
> This mechanism does do one odd thing with an all Latin-1 string:
> >.. FF.. => ..FF 99.. I would suggest a slight change to make Latin-1
> simple, as
> follows (marked with ***)
> 
> B. RACE+LATIN1
> Determine the set of all high octets (first of pairs).
> If that set is has more than 2 members, or if 00 is not in the set, the
> output is D8 + input.
> *** If the set has 1 member (e.g. is {00}), the output is 00 + input low
> octets.*** // latin-1 exactly
> Call the largest element of the set U1.
> If U1 = D8..DC, return error.
> If <00, 99> is in the input pairs, return an error.
> Otherwise the output is U1 then the following encoding of pairs:
> - <U1, FF> => FF 99
> - <U1, XX> => XX
> - <00, XX> => FF XX
> 
> In either (A) or (B), if you have a pure Latin-1 string of length 15, it is
> 16 bytes in RACE.
> If one of those characters is non Latin-1, RACE goes to 30. I don't know
> how much freedom we have to change RACE at this point or whether we want to
> consider it, but an alternative saves some storage, and is just as easy to
> code.
> 
> C. RACE+RUNS
> Determine the set of all high octets (first of pairs).
> If that set is has more than 2 members, or if 00 is not in the set, the
> output is D8 + input.
> If the set has 1 member (e.g. is {00}), the output is 00 + input low
> octets. // **** latin-1 exactly
> Call the largest element of the set U1.
> If U1 = D8..DC, return error.
> Otherwise output is U1, then repeat the following 4 steps until done.
> 1 Find the number of pairs having U1 as the first octets.
> 2 Output that number, followed by all those low octets.
> 3 Find the number of pairs having 00 as the first octets.
> 4 Output that number, followed by all those low octets.
> 5 Go back to #1
> 
> Cost: 1 byte per character + 1 byte per transition.
> The numbers in 2,4 will always fit in a byte, since we have less than 256
> chars possible.
> Worst case is 2 * number of chars (plus 1 for header).
> 
> Mark
> 
>