[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: psamp vocabulary

To: Rae McLellan <rae@research.bell-labs.com>
Subject: Re: psamp vocabulary
From: Nick Duffield <duffield@research.att.com>
Date: Thu, 12 Sep 2002 16:57:51 -0400
Cc: psamp@ops.ietf.org
References: <200209122005.g8CK5djB125266537@nslocum.cs.bell-labs.com>

Rae,

Rae McLellan wrote:
> 
> > having only deterministic selectors is an intriguing idea, but I have
> > two misgivings:
> >
> > 1) we need to allow the simplest implementations in order for PSAMP
> > to be ubiquitous. Decrementing a counter is very simple: if we exclude
> > it then some devices might find it difficult to do PSAMP sampling.
> 
> Are you suggesting psamp define levels of conformance?

As a general thing, maybe. But in the sample of 1 in N sampling I was
making a different point: if we agree that "sampling 1/Nth of the
packets
on (some) average" is basic function we wish to have, then we shouldn't
have a standard that prevent vendors from constructing simple
implementations
of this, e.g. decrement a counter. (As I understand, in your suggestion,
these
were excluded as part of your "third group") 

> 
> > 2) if all selection operations are deterministic on packet content, it
> > would be easier to construct packets to evade selection (although having
> > a strong hash function with an obscure selection criterion makes this
> > more difficult). Or even without malice, with a weak hash function you
> > might have an unlucky traffic mix where you entirely miss a large
> > bunch of traffic. Having the option of random selection guards against
> > this.
> 
> ok, here's an example where hashing doesn't provide the functionality
> of true random sampling.  Thanks.

Just to be clear: hashing doesn't provide true random functionality, but
a
good hash function with good input and selection range could be used to
get
something close to it, and mitigates the problems I identified with the
example.

> 
> > I have some comments on the hash functions that you mentioned in a
> > previous message:
> >>    - IP ID & <mask> == <value>
> >>    - IP Checksum & <mask> == <value>
> >>    - Checksum(IP header w/o TTL) & <mask> == <value>
> >>    (note that these last 3 can be used to generate an almost uniform
> >>     sample of the IP packets, yet they're still based on IP header)
> >
> > Have you done any experiments on the statistical quality of these as
> > hash functions for packet selection? As hash functions go these are very
> > weak. Having good statistical properties of selection would rely on
> > having a tame distribution of the field contents of the packets.
> > (This can't be relied upon: we looked at traces, and there are gotchas
> > there for the ID field in particular due, it seems, to bad
> > implementations). I'm concerned that they would be easy to evade.
> 
> Yes, I've looked at the distribution of IP ID field values.  And except
> for the ID values equal to 0 or 1, they are very evenly distributed.
> The anomalous behavior of ID=0 and ID=1 appears to come from ICMP
> router chat.  Apparently some router's are clearing the the ID field
> for each ICMP message instead of maintaining and using an IP packet
> counter for each interface.
> 
> And yes, a malicious packet generator could manage to avoid selection
> by any hashing algorithm based on IP header contents alone.
> Perhaps obscurity of hashing function isn't enough to avoid this.

I distinguish three parts of hash-based sampling:
1. the input to the hash function
2. the hash function
3. the selection range (select the packet of hash falls in this range)

. Generally, the bigger the input, the better the appearance of
randomness.
Including fields beyond the IP header would give a better appearance.
But this
in itself doesn't protect against malicious users tailoring packets.

. It's not the hash function itself that needs to be obscure. Rather, if
the 
selection range is a parameter than can be (re) configured and kept
private,
then it is difficult for an attacker to avoid selection, even if they
can 
construct a packet to have a given hash value, if they don't know the
range. 

Nick

> 
> > And even with a tame distribution of packet fields, having a uniform
> > selection distribution is not the only desirable property. We also
> > want small correlations between selection decisions of successive
> > packets, including selection of packets from the same IP level flow
> > (i.e. packets with same IP src/dst address). The input of these hashes
> > doesn't change much from packet to packet of the flow, and the hash
> > function is weak, so there will be a lot of correlation.
> >
> > A strong hash function should have the property, roughly speaking, that
> > flipping a bit of the input gives a big change in the hash function.
> > This gives the statistical properties of selection some robustness
> > against correlations in the packet contents. The IP checksum does not
> > have this property. IP ID increments, so there's not much variation in
> > it.
> 
> I suggested the CRC as a simple hash function that was already computed,
> (I'm lazy).  I'd appreciate examples of hard hash functions on the IP
> header contents.
>                         Rae McLellan
> 
> --
> to unsubscribe send a message to psamp-request@ops.ietf.org with
> the word 'unsubscribe' in a single line as the message text body.
> archive: <http://ops.ietf.org/lists/psamp/>

--
to unsubscribe send a message to psamp-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/psamp/>

References:
- psamp vocabulary
  - From: Rae McLellan <rae@research.bell-labs.com>

Prev by Date: psamp vocabulary
Next by Date: Re: psamp vocabulary
Previous by thread: psamp vocabulary
Next by thread: psamp vocabulary
Index(es):
- Date
- Thread