[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Terminology harmonization: a proposal



Maurizio,

comments below:

Maurizio Molina wrote:
> 
> Nick Duffield wrote:
> 
> > Maurizio,
> >
> > thanks for opening up the discussion on terminology.
> > Comments below:
> >
> > Maurizio Molina wrote:
> > >
> > > Hi,
> > >     After browsing through Nick's draft, the PSAMP mailing list, and
> > > Tanja's draft proposal, I see there's still the need to harmonize some
> > > terminology (and concepts) regarding packet selection,  sampling and
> > > filtering.
> > >
> > > 1) From Nick's draft:
> > > 3.2 Packet selection
> > > ..........Packet selection is performed through combination a number
> > > of   measurement primitives described below.......
> > > - Hashing
> > > ...........
> > > - Filtering
> > > ...........
> > > - Sampling
> > > ...........
> > >
> > > 2) From a Nick's mail:
> > > ......With the words currently at my disposal, my usage is:
> > > 1. sampling = 1 in N (periodic or statistical) or hash-based
> > > 2. filtering = filtering
> > > 3. (primitive) selectors = either 1 or 2, and further methods TBD
> > > 4. (composite) selectors = composites of methods from 3
> > >
> > > 3) Tanja's draft: it focuses on samping, but it mentions some methods
> > > (e.g. stratified sampling, or sampling dependent on the packet content)
> > > that im my view are already a combination of filtering + sampling. This
> > > point is clarified later.
> > >
> > > My proposal for harmonizing terminology/concepts would be the following:
> > >
> > > a) "primitives" for packet selection are only sampling and filtering.
> > > Composite packet selection methodologies can then be built by a
> > > combination of the two.
> > > b) sampling is always "blind" to packet content. A packet is sampled out
> > > of a stream only dependening on the packet position (which can be
> > > spatial or temporal) and/or on the result of a sampling algorithm (which
> > > can be deterministic or probabilistic).
> >
> > I think it will be too limiting to say that type (a) sampling must
> > be blind to packet content. Implementations may want to use content
> > from the packet stream as a cheap way to insert or seed randomness into
> > sampling decisions. (In a different arena, this approach has been
> > proposed for importance sampling of flow statistics)
> 
> Nick,
> Clearly distinguishing between filtering (based ONLY on packet content) and
> sampling (NOT based on packet content) is aimed at finding a way to formally
> describe any packet selection mechanism, i.e. an information model that could
> be used to configure packet selectors in a standard way.

I think we are not using the word "primitive" in the same way.

My understanding of your message is that you have a notion of a formal 
set of packet selection primitives that could be used to formally
construct sampling operations, and here you describe an example:

> Most of the packet selection examples described in Tanja's draft could be in
> fact easily  described as a combination of the two. E.g., sampling dependent
> on the packet length -> first describe a filter that create different
> substreams on the basis of packet length, then describe different samplers one
> for each substream with different sampling frequencies. This clear distinction
> applies to the formal definition only. The way it is implemented can then be a
> real "hybrid", as you mention.

So your primitives don't necessarily correspond individually to
functions
that would be available in the passim device, yes? In my reading of the
framework, the notion of primitive is different. A primitive would 
correspond exactly to a non-composite selection/sampling function that 
could available in the PSAMP device.

I'm wary of getting bogged down in the formalities. Although such a 
primitive could be formally expressible in the manner you suggest, 
sometimes as a composite of your formal primitives, I'm not sure 
what that buys you in practice. I'd rather just define the functional
selection/sampling primitives at the outset, than put work into finding
some set of formal primitives that could be used to express them. Would
the
set of formal primitives be minimal? uniquely so? Would the functional 
primitives be uniquely expressible as combinations as the formal
primitive?
Scope for endless discussion here... 

 
> 
> However, at least a packet selection method is actually quite difficult to
> express as a combination of the two: namely, what is described in Tanja's doc
> (4.2.3 - packet content based trigger) that is (I guess) also what you refer
> to when you say "....Implementations may want to use content from the packet
> stream as a cheap way to insert or seed randomness into sampling
> decisions.....".
> Triggering a sampling procedure on the basis of the packet content, and then
> sampling according to some other method, can be viewed as something really
> "hybrid". In the formalization attempt I'm currently working on, however, I'm
> trying to still represented it by a composition of the two basic selectors
> (sampling & filtering).
> 
> > > c) filtering is on the contrary "blind" to the packet position in the
> > > stream, but it is based on the packet properties. A packet property may
> > > be simply its content, or the content of a set of subfields, or the
> > > result of a function taking as an input (part of) the packet content.
> > >
> >
> > I agree that at a high level one can regard hash-based selection as a
> > type
> > of filtering, since it relies on a complex but deterministic function of
> > the packet contents. Perhaps it is worth noting that any good hash-based
> > selection function would be infeasible to express as a composition of
> > match/mask filters; I've been asked this.
> >
> > But I'm concerned that the (a) /(b) division may close the door to
> > future
> > selectors that people may invent, or implementations (e.g. of sampling)
> > that are really a hybrid of the two approaches.
> 
> I agree. We should avoid to say that EVERY current and future selection
> methodology can be expressed by these two building blocks.
> 
> >
> > And why stop there? At an abstract level one could regard all the
> > selectors as (i) calculating some quantity depending on the packet
> > content and/or other variables; then (ii) selecting packet if the
> > quantity falls in a given range. A danger with trying to couch the
> > framework this way is of losing focus of the basic functionality
> > that we want (e.g. sample 1 in N packets, somehow) and dwelling on
> > implementations (e.g. different ways of getting 1 in N sampling in this
> > framework, e.g. decrement a counter, or calculate a hash, or use a
> > well-known random number generator, or seed a counter with the packet
> > stream).
> 
> I'm not sure I fully understood this  point. If the only sampling type to be
> supported by a standard is 1 in N sampling (how it is done being
> implementation specific) then there's not much to do in PSAMP. 

Using 1 in N sampling as an example (and not excluding other types
of sampling) my point is that folks may want to implement this different
ways, and that the framework shouldn't inhibit that. 

But I guess the
> scope of PSAMP is exactly creating a set of commonly agreed standard
> procedures so that users of sampled data are more confident of what they're
> receiving. Also, as you explained in your papers, applications like trajectory
> sampling need a common hash function on all the crossed nodes....
> Of course, not all the bits of an implementation need to be standardized, but
> I guess that defining the border between what needs to be standardized and
> what not is exactly PSAMP's job, isn't it?
> 
> Maurizio
> 

Nick

> >
> >
> > > Some notes:
> > > n1) filtering is always deterministic.
> > > n2) hashing is a sub-case of filtering.
> > > n3) how "complex" a composite selector can be still needs to be
> > > discussed, but by sure the methodologies Tanja mentioned (stratified
> > > sampling, or sampling dependent on the packet content) can be
> > > implemented by a cascaded filter->sampler.
> > > n4) Another example of a filtering function could be taking the source
> > > and/or destination address, lookup the source/destination AS and filter
> > > on the basis of the result. While such a complex filtering function
> > > doesn't  make sense at the line rate, it may make sense if a sampler is
> > > placed in front of a filter to reduce the rate of packets to be
> > > processed. In this respect, the text appearing in Nick's draft at the
> > > bottom of 3.2 reported below (unavailability of router state to
> > > measurement primitives) should be reconsidered.
> > >
> > >    "In order to be able to function at line rates, each measurement
> > >    primitive take as its input only a packet itself, or quantities
> > >    that have been calculated from the packet previously by other
> > >    measurement primitives. Router state is not assumed to be available
> > >    to the measurement primitives."
> > >
> >
> > Yes, I have been wondering whether this should be reworked after Peram
> > brought up this point a little while ago.
> >
> > The reason for excluding router state from the primitive operations
> > was arhictectural: we didn't want to assume that the routing state would
> > be available to a filter that could be required to operate at line rate.
> > (Any comments on this from implementors?)
> >
> > But we do assume that routing state, if present in the measuring network
> > element, will be available to form the packet reports, so it should be
> > feasible to do filtering based on routing state when reports
> > are formed i.e. after all the other selection primitives have operated.
> >
> > Nick
> >
> >
> > > Maurizio

--
to unsubscribe send a message to psamp-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/psamp/>