[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Terminology harmonization: a proposal



Hi Maurizio and Nick and Rae,

In my opinion  packet selectors can be distinguised according to
- the selection function: deterministic or random.
- input parameters (per packet): packet position, arrival time, packet content
(as Rae pointed out input parameters also could be additional information like interfaces, or some output of other functions on the router (router reaction), etc..)

The question now is whether and how we want to categorize these packet selectors into sampling and filtering primitives (maybe we need a third categorie ?) . In your emails I saw various approaches for a categorization. But I think we need a clear reasoning and a clear definition how filtering and sampling is distinguished. So I tried to find out the cases in question:

a)  deterministic function on packet content  (hashing also would fall in this categorie) ==> filtering
b) deterministic function on packet position (every k-th packet) ==> ? (currently denoted as systematic sampling)
c) deterministic function on arrival time  (all packets in given time interval  ) ==> ? (currently denoted as systematic sampling)
d) random function on packet content.  ==> probably not needed ?
e) random function on packet position ==> sampling
f) random function on arrival time  ==> probably not needed ?

I see two alternatives for categorization:

1. Categorization according to selection function: filtering is deterministic, sampling is random
If we categorize like this, systematic sampling (e.g. selection of every k-th packet) would be a filtering primitive.

2. Categorization according to input parameters (as Maurizio proposed): filtering works on packet content, sampling not
One reason to distinguish in this way could be that (most parts of the) packet content would be the same at different measurement points whereas packet position and arrival time of the packet vary for different observation points. That means we could specify a filter which can be applied to various measurement points and ensures that the same packets are selected at all points. Nevertheless this only works if we limit packet content to the invariant fields in the packet (so the TTL example from Rao would not work).

An alternative would be to define further categorizes or simply dont categorize. We could just talk about packet selectors (defined by function and parameters) and treat sampling and filtering as specific forms of packet selection functions.
Any preferences or opinions ?

Regards
Tanja

Maurizio Molina wrote:
3D7CABFC.A9BE7984@ccrle.nec.de">

Nick Duffield wrote:

Maurizio,

thanks for opening up the discussion on terminology.
Comments below:

Maurizio Molina wrote:
Hi,
After browsing through Nick's draft, the PSAMP mailing list, and
Tanja's draft proposal, I see there's still the need to harmonize some
terminology (and concepts) regarding packet selection, sampling and
filtering.

1) From Nick's draft:
3.2 Packet selection
..........Packet selection is performed through combination a number
of measurement primitives described below.......
- Hashing
...........
- Filtering
...........
- Sampling
...........

2) From a Nick's mail:
......With the words currently at my disposal, my usage is:
1. sampling = 1 in N (periodic or statistical) or hash-based
2. filtering = filtering
3. (primitive) selectors = either 1 or 2, and further methods TBD
4. (composite) selectors = composites of methods from 3

3) Tanja's draft: it focuses on samping, but it mentions some methods
(e.g. stratified sampling, or sampling dependent on the packet content)
that im my view are already a combination of filtering + sampling. This
point is clarified later.

My proposal for harmonizing terminology/concepts would be the following:

a) "primitives" for packet selection are only sampling and filtering.
Composite packet selection methodologies can then be built by a
combination of the two.
b) sampling is always "blind" to packet content. A packet is sampled out
of a stream only dependening on the packet position (which can be
spatial or temporal) and/or on the result of a sampling algorithm (which
can be deterministic or probabilistic).
I think it will be too limiting to say that type (a) sampling must
be blind to packet content. Implementations may want to use content
from the packet stream as a cheap way to insert or seed randomness into
sampling decisions. (In a different arena, this approach has been
proposed for importance sampling of flow statistics)

Nick,
Clearly distinguishing between filtering (based ONLY on packet content) and
sampling (NOT based on packet content) is aimed at finding a way to formally
describe any packet selection mechanism, i.e. an information model that could
be used to configure packet selectors in a standard way.
Most of the packet selection examples described in Tanja's draft could be in
fact easily described as a combination of the two. E.g., sampling dependent
on the packet length -> first describe a filter that create different
substreams on the basis of packet length, then describe different samplers one
for each substream with different sampling frequencies. This clear distinction
applies to the formal definition only. The way it is implemented can then be a
real "hybrid", as you mention.

However, at least a packet selection method is actually quite difficult to
express as a combination of the two: namely, what is described in Tanja's doc
(4.2.3 - packet content based trigger) that is (I guess) also what you refer
to when you say "....Implementations may want to use content from the packet
stream as a cheap way to insert or seed randomness into sampling
decisions.....".
Triggering a sampling procedure on the basis of the packet content, and then
sampling according to some other method, can be viewed as something really
"hybrid". In the formalization attempt I'm currently working on, however, I'm
trying to still represented it by a composition of the two basic selectors
(sampling & filtering).

c) filtering is on the contrary "blind" to the packet position in the
stream, but it is based on the packet properties. A packet property may
be simply its content, or the content of a set of subfields, or the
result of a function taking as an input (part of) the packet content.

I agree that at a high level one can regard hash-based selection as a
type
of filtering, since it relies on a complex but deterministic function of
the packet contents. Perhaps it is worth noting that any good hash-based
selection function would be infeasible to express as a composition of
match/mask filters; I've been asked this.

But I'm concerned that the (a) /(b) division may close the door to
future
selectors that people may invent, or implementations (e.g. of sampling)
that are really a hybrid of the two approaches.

I agree. We should avoid to say that EVERY current and future selection
methodology can be expressed by these two building blocks.

And why stop there? At an abstract level one could regard all the
selectors as (i) calculating some quantity depending on the packet
content and/or other variables; then (ii) selecting packet if the
quantity falls in a given range. A danger with trying to couch the
framework this way is of losing focus of the basic functionality
that we want (e.g. sample 1 in N packets, somehow) and dwelling on
implementations (e.g. different ways of getting 1 in N sampling in this
framework, e.g. decrement a counter, or calculate a hash, or use a
well-known random number generator, or seed a counter with the packet
stream).

I'm not sure I fully understood this point. If the only sampling type to be
supported by a standard is 1 in N sampling (how it is done being
implementation specific) then there's not much to do in PSAMP. But I guess the
scope of PSAMP is exactly creating a set of commonly agreed standard
procedures so that users of sampled data are more confident of what they're
receiving. Also, as you explained in your papers, applications like trajectory
sampling need a common hash function on all the crossed nodes....
Of course, not all the bits of an implementation need to be standardized, but
I guess that defining the border between what needs to be standardized and
what not is exactly PSAMP's job, isn't it?

Maurizio


Some notes:
n1) filtering is always deterministic.
n2) hashing is a sub-case of filtering.
n3) how "complex" a composite selector can be still needs to be
discussed, but by sure the methodologies Tanja mentioned (stratified
sampling, or sampling dependent on the packet content) can be
implemented by a cascaded filter->sampler.
n4) Another example of a filtering function could be taking the source
and/or destination address, lookup the source/destination AS and filter
on the basis of the result. While such a complex filtering function
doesn't make sense at the line rate, it may make sense if a sampler is
placed in front of a filter to reduce the rate of packets to be
processed. In this respect, the text appearing in Nick's draft at the
bottom of 3.2 reported below (unavailability of router state to
measurement primitives) should be reconsidered.

"In order to be able to function at line rates, each measurement
primitive take as its input only a packet itself, or quantities
that have been calculated from the packet previously by other
measurement primitives. Router state is not assumed to be available
to the measurement primitives."

Yes, I have been wondering whether this should be reworked after Peram
brought up this point a little while ago.

The reason for excluding router state from the primitive operations
was arhictectural: we didn't want to assume that the routing state would
be available to a filter that could be required to operate at line rate.
(Any comments on this from implementors?)

But we do assume that routing state, if present in the measuring network
element, will be available to form the packet reports, so it should be
feasible to do filtering based on routing state when reports
are formed i.e. after all the other selection primitives have operated.

Nick


Maurizio


--
to unsubscribe send a message to psamp-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/psamp/>

-- 
Dipl.-Ing. Tanja Zseby			    	      	
FhI FOKUS/Global Networking			Email: zseby@fokus.fhg.de	
Kaiserin-Augusta-Allee 31				Phone: +49-30-3463-7153
D-10589 Berlin, Germany				Fax:   +49-30-3463-8153
-------------------------------------------------------------------------------------- 
"Living on earth is expensive but it includes a free trip around the sun." (Anonymous)
--------------------------------------------------------------------------------------