[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Hashing function for PSAMP



duffield@research.att.com wrote:

Maurizio,



-----Original Message-----
From: owner-psamp@ops.ietf.org [mailto:owner-psamp@ops.ietf.org] On


Behalf


Of Maurizio Molina
Sent: Monday, January 24, 2005 1:22 PM
To: Saverio Niccolini
Cc: psamp@psg.com
Subject: Re: Hashing function for PSAMP

Saverio Niccolini wrote:



Dear all,
I would like to ask a simple question to the list.
As you have seen in the "Sampling and Filtering Techniques for IP


Packet


Selection" draft we have tried to give suggestions on which is the


best


hash function in order to do packet sampling.

Hash Functions Suitable for Packet Selection
1. IPSX
(2. BOB)

Hash Functions Suitable for Packet Digesting
1. CRC-32
(2. BOB)

Thinking about it and doing some additional tests it turned out that:
1) IPSX can only accepts 16 bytes as input --> it is not useful for


IPv6


packets.
Do we want to stay with IPSX that is 7 times faster than BOB but can


not


be used with IPv6 packets? What is the list feeling about this?




One thing to clarify is if IPSX cannot really be extended.
by looking at its (very simple) description,

          1.    v1 = f1 ^ f2;
          2.    v2 = f3 ^ f4;
          3.    h1 = v1 << 8;
          4.    h1 ^= v1 >> 4;
          5.    h1 ^= v1 >> 12;
          6.    h1 ^= v1 >> 16;
          7.    h1 ^= v2 << 6;
          8.    h1 ^= v2 << 10;
          9.    h1 ^= v2 << 14;
          10.   h1 ^= v2 >> 7;

where f1, f2, f3 and f4 are the four 32 bit input words, my guess


would


be that  for  extending it to six 32 bit input word it would be
necessary to select other two 32 bit word from the packet (say f5 and
f6) and add four other shifting and XOR steps.
That is:

          11.   v3 = f5 ^ f6;
          12.   h1 ^= v3 << N;
          13.   h1 ^= v3 >> M;
		............

The point is to understand what is the rationale of the choice of the
number of shift positions and their direction in steps 3 to 10.
Is there is a "rule" this can be extended to other steps? Or was more


or


less a "random choice"? or is there any reason for which an extension
should not work?
I think we need Nick's opinion on that...



The rationale was to exploit variability expected and/or found empirically in different fields in the IPv4 packet header. A number of other variants on this theme were evaluated. The final choice represented a good combination of computational simplicity, and quasi-randomness, observed in testing.

It is possible to extend the form to longer inputs, along the lines that
you suggest above. However, evaluation of different variants is
difficult in practice. Due to the comparatively small deployment of
IPv6, it is questionable whether IPv6 traces available today would be
representative (in terms of the variability of their fields) of what
could be expected in future. This is a problem for the evaluation of any
hash.


My idea, to start with, was to evaluate the "IPSX extension" on the IPv4 traces.
Saverio did some tests (on traces of the of the MAWI project). His results showed that all hashes (BOB, CRC-32, IPSX, MMH) showed a poor uniformity (according to the Chi-square tests) with 16 input bytes. Moving from 16 input bytes to 20 (and then further to 24, 28, 32, etc), some of the hases (BOB, CRC-32) showed a big improvement in their uniformity behavior, while MMH didn't. For IPSX, it was only possible to test it on 16 bytes, where it showed a poor uniformity behaviour as all the other functions. If we can have an IPSX variant, we'll be at least able to understand if moving from 16 to 20 input bytes (and then possibly to 24, 28, etc...) it will improve its behaviour (like BOB and CRC-32) or not (like MMH). This will already be an interestring result for PSAMP to give a reccomendation.
Maurizio


Nick



Maurizio



2) BOB is faster than CRC-32 (on software implementation) and


achieves


as good collision probability as CRC-32.
Do we still want to recommend CRC-32 because we believe that hardware
implementation of CRC-32 are already out or we just would like to
promote BOB to recommended and CRC-32 as optional?

I am asking this because we are going to submit a new version of the
draft and we would definitely like to fix this issue.

Thanks in advance for your comments,
Saverio

============================================================
Dr. Saverio Niccolini
Research Staff Member
Network Laboratories, NEC Europe Ltd.
Kurfuerstenanlage 36, D-69115 Heidelberg
Tel.     +49 (0)6221 90511-18
Fax:     +49 (0)6221 90511-55
e-mail:  saverio.niccolini@netlab.nec.de
============================================================


-- to unsubscribe send a message to psamp-request@ops.ietf.org with the word 'unsubscribe' in a single line as the message text body. archive: <http://ops.ietf.org/lists/psamp/>




--
to unsubscribe send a message to psamp-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/psamp/>







--
to unsubscribe send a message to psamp-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/psamp/>