[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Terminology harmonization: a proposal

To: Tanja Zseby <zseby@fokus.gmd.de>,Andy Bierman <abierman@cisco.com>
Subject: Re: Terminology harmonization: a proposal
From: Nick Duffield <duffield@research.att.com>
Date: Thu, 12 Sep 2002 18:23:41 -0400
Cc: Maurizio Molina <molina@ccrle.nec.de>, psamp <psamp@ops.ietf.org>
References: <3D787FEC.DD6E5E62@ccrle.nec.de> <3D78DD08.531F3330@research.att.com> <3D7CABFC.A9BE7984@ccrle.nec.de> <3D7DBBBF.3040000@fokus.fhg.de>
Tanja,

comments below:

Tanja Zseby wrote:
> 
> Hi Maurizio and Nick and Rae,
> 
> In my opinion  packet selectors can be distinguised according to
> - the selection function: deterministic or random.
> - input parameters (per packet): packet position, arrival time, packet
> content
> (as Rae pointed out input parameters also could be additional
> information like interfaces, or some output of other functions on the
> router (router reaction), etc..)
> 
> The question now is whether and how we want to categorize these packet
> selectors into sampling and filtering primitives (maybe we need a
> third categorie ?) . In your emails I saw various approaches for a
> categorization. But I think we need a clear reasoning and a clear
> definition how filtering and sampling is distinguished. So I tried to
> find out the cases in question:
> 
> a)  deterministic function on packet content  (hashing also would fall
> in this categorie) ==> filtering
> b) deterministic function on packet position (every k-th packet) ==> ?
> (currently denoted as systematic sampling)
> c) deterministic function on arrival time  (all packets in given time
> interval  ) ==> ? (currently denoted as systematic sampling)
> d) random function on packet content.  ==> probably not needed ?
> e) random function on packet position ==> sampling
> f) random function on arrival time  ==> probably not needed ?
> 
> I see two alternatives for categorization:
> 
> 1. Categorization according to selection function: filtering is
> deterministic, sampling is random
> If we categorize like this, systematic sampling (e.g. selection of
> every k-th packet) would be a filtering primitive.
> 
> 2. Categorization according to input parameters (as Maurizio
> proposed): filtering works on packet content, sampling not
> One reason to distinguish in this way could be that (most parts of
> the) packet content would be the same at different measurement points
> whereas packet position and arrival time of the packet vary for
> different observation points. That means we could specify a filter
> which can be applied to various measurement points and ensures that
> the same packets are selected at all points. Nevertheless this only
> works if we limit packet content to the invariant fields in the packet
> (so the TTL example from Rao would not work).
> 
> An alternative would be to define further categorizes or simply dont
> categorize. We could just talk about packet selectors (defined by
> function and parameters) and treat sampling and filtering as specific
> forms of packet selection functions.
> Any preferences or opinions ?

When it comes to writing a standard, my preference is for the last
alternative: concentrating on the functionality and parameters of the
configurable selectors.

At a lower level, we want to set of the universe of selection 
operations which we will draw from. Given that we're not sure 
whether some things are sampling or filtering, then perhaps it 
doesn't make sense to have separate documents on each. Even 
if we could agree as to which is what, the categories mentioned 
in 1 and 2 would probably cut across the eventual functions, 
and this could lead to confusion. Examples:

. hashing is deterministic, but a strong hash function with 
suitable input can be used to implement apparently random sampling.

. in scheme 1, 1 in N selection is a filter (on some counter)
but functionally it might be thought of as (an implementation of)
1 in N sampling.

. we can have a function of "sampling 1/Nth of the packets on 
average" which can have different implementations, some of which 
are more deterministic than others e.g. (i) sample based on hash 
of packet content, (ii) sample based on deterministic function of 
packet position, (iii) use random number generator seeded with 
packet content; (iv) use independent random number generator. We 
should lay out all these possibilities in our universe.

(BTW, do people agree that these could just be different 
implementations of the same function, or, could these actually 
be different functions in the PSAMP device?). 

Rather than trying to solve the sampling/filtering conundrum, 
maybe the thing to do is describe the universe in a more neutral 
way:
. inputs 
  - some contents from packets
  - layer 2?
  - associated routing state?
  - subsidiary quantities, perhaps calculated from the packet 
    contents, (possibly from multiple packets) e.g. 
    o hashes
    o packet seeded random-like numbers
    o counters and timers, and rules for their update
  - anything else?
. selection
    just filters on the inputs?

Any of our functional primitives must be describable by combining these,
although if there is more than one combination that meets a functional
requirement, then the standard would be neutral as to which one should
be
used: its an implementation choice.

Nick

> 
> Regards
> Tanja
> 
> Maurizio Molina wrote:
> 
> > Nick Duffield wrote:
> >
> >> Maurizio,
> >> thanks for opening up the discussion on terminology.
> >> Comments below:
> >> Maurizio Molina wrote:
> >>
> >> > Hi,
> >> >     After browsing through Nick's draft, the PSAMP mailing list,
> >> > and
> >> > Tanja's draft proposal, I see there's still the need to harmonize
> >> > some
> >> > terminology (and concepts) regarding packet selection,  sampling
> >> > and
> >> > filtering.
> >> > 1) From Nick's draft:
> >> > 3.2 Packet selection
> >> > ..........Packet selection is performed through combination a
> >> > number
> >> > of   measurement primitives described below.......
> >> > - Hashing
> >> > ...........
> >> > - Filtering
> >> > ...........
> >> > - Sampling
> >> > ...........
> >> > 2) From a Nick's mail:
> >> > ......With the words currently at my disposal, my usage is:
> >> > 1. sampling = 1 in N (periodic or statistical) or hash-based
> >> > 2. filtering = filtering
> >> > 3. (primitive) selectors = either 1 or 2, and further methods TBD
> >> > 4. (composite) selectors = composites of methods from 3
> >> > 3) Tanja's draft: it focuses on samping, but it mentions some
> >> > methods
> >> > (e.g. stratified sampling, or sampling dependent on the packet
> >> > content)
> >> >
> >> > that im my view are already a combination of filtering +
> >> > sampling. This
> >> > point is clarified later.
> >> > My proposal for harmonizing terminology/concepts would be the
> >> > following:
> >> > a) "primitives" for packet selection are only sampling and
> >> > filtering.
> >> > Composite packet selection methodologies can then be built by a
> >> > combination of the two.
> >> > b) sampling is always "blind" to packet content. A packet is
> >> > sampled out
> >> > of a stream only dependening on the packet position (which can be
> >> > spatial or temporal) and/or on the result of a sampling algorithm
> >> > (which
> >> > can be deterministic or probabilistic).
> >> >
> >> I think it will be too limiting to say that type (a) sampling must
> >> be blind to packet content. Implementations may want to use
> >> content
> >> from the packet stream as a cheap way to insert or seed randomness
> >> into
> >> sampling decisions. (In a different arena, this approach has been
> >> proposed for importance sampling of flow statistics)
> >>
> > Nick,
> > Clearly distinguishing between filtering (based ONLY on packet
> > content) and
> > sampling (NOT based on packet content) is aimed at finding a way to
> > formally
> > describe any packet selection mechanism, i.e. an information model
> > that could
> > be used to configure packet selectors in a standard way.
> > Most of the packet selection examples described in Tanja's draft
> > could be in
> > fact easily  described as a combination of the two. E.g., sampling
> > dependent
> > on the packet length -> first describe a filter that create
> > different
> > substreams on the basis of packet length, then describe different
> > samplers one
> > for each substream with different sampling frequencies. This clear
> > distinction
> > applies to the formal definition only. The way it is implemented can
> > then be a
> > real "hybrid", as you mention.
> > However, at least a packet selection method is actually quite
> > difficult to
> > express as a combination of the two: namely, what is
> >  described in Tanja's doc
> > (4.2.3 - packet content based trigger) that is (I guess) also what
> > you refer
> > to when you say "....Implementations may want to use content from
> > the packet
> > stream as a cheap way to insert or seed randomness into sampling
> > decisions.....".
> > Triggering a sampling procedure on the basis of the packet content,
> > and then
> > sampling according to some other method, can be viewed as something
> > really
> > "hybrid". In the formalization attempt I'm currently working on,
> > however, I'm
> > trying to still represented it by a composition of the two basic
> > selectors
> > (sampling & filtering).
> >
> >> > c) filtering is on the contrary "blind" to the packet position in
> >> > the
> >> > stream, but it is based on the packet properties. A packet
> >> > property may
> >> > be simply its content, or the content of a set of subfields, or
> >> > the
> >> > result of a function taking as an input (part of) the packet
> >> > content.
> >> >
> >> I agree that at a high level one can regard hash-based selection
> >> as a
> >> type
> >> of filtering, since it relies on a complex but deterministic
> >> function of
> >> the packet contents. Perhaps it is worth noting that any good
> >> hash-based
> >> selection function would be infeasible to express as a composition
> >> of
> >> match/mask filters; I've been asked this.
> >> But I'm concerned that the (a) /(b) division may close the door to
> >> future
> >> selectors that people may invent, or implementations (e.g. of
> >> sampling)
> >> that are really a hybrid of the two approaches.
> >>
> > I agree. We should avoid to say that EVERY current and future
> > selection
> > methodology can be expressed by these two building blocks.
> >
> >> And why stop there? At an abstract level one could regard all the
> >> selectors as (i) calculating some quantity depending on the packet
> >> content and/or other variables; then (ii) selecting packet if the
> >> quantity falls in a given range. A danger with trying to couch the
> >> framework this way is of losing focus of the basic functionality
> >> that we want (e.g. sample 1 in N packets, somehow) and dwelling on
> >> implementations (e.g. different ways of getting 1 in N sampling in
> >> this
> >> framework, e.g. decrement a counter, or calculate a hash, or use a
> >> well-known random number generator, or seed a counter with the
> >> packet
> >> stream).
> >>
> > I'm not sure I fully understood this  point. If the only sampling
> > type to be
> > supported by a standard is 1 in N sampling (how it is done being
> > implementation specific) then there's not much to do in PSAMP. But I
> > guess the
> > scope of PSAMP is exactly creating a set of commonly agreed standard
> > procedures so that users of sampled data are more confident of what
> > they're
> > receiving. Also, as you explained in your papers, applications like
> > trajectory
> > sampling need a common hash function on all the crossed nodes....
> > Of course, not all the bits of an implementation need to be
> > standardized, but
> > I guess that defining the border between what needs to be
> > standardized and
> > what not is exactly PSAMP's job, isn't it?
> > Maurizio
> >
> >> > Some notes:
> >> > n1) filtering is always deterministic.
> >> > n2) hashing is a sub-case of filtering.
> >> > n3) how "complex" a composite selector can be still needs to be
> >> > discussed, but by sure the methodologies Tanja mentioned
> >> > (stratified
> >> > sampling, or sampling dependent on the packet content) can be
> >> > implemented by a cascaded filter->sampler.
> >> > n4) Another example of a filtering function could be taking the
> >> > source
> >> > and/or destination address, lookup the source/destination AS and
> >> > filter
> >> > on the basis of the result. While such a complex filtering
> >> > function
> >> > doesn't  make sense at the line rate, it may make sense if a
> >> > sampler is
> >> > placed in front of a filter to reduce the rate of packets to be
> >> > processed. In this respect, the text appearing in Nick's draft at
> >> > the
> >> > bottom of 3.2 reported below (unavailability of router state to
> >> > measurement primitives) should be reconsidered.
> >> >    "In order to be able to function at line rates,
> >> >  each measurement
> >> >    primitive take as its input only a packet itself, or
> >> > quantities
> >> >    that have been calculated from the packet previously by other
> >> >    measurement primitives. Router state is not assumed to be
> >> > available
> >> >    to the measurement primitives."
> >> >
> >> Yes, I have been wondering whether this should be reworked after
> >> Peram
> >> brought up this point a little while ago.
> >> The reason for excluding router state from the primitive
> >> operations
> >> was arhictectural: we didn't want to assume that the routing state
> >> would
> >> be available to a filter that could be required to operate at line
> >> rate.
> >> (Any comments on this from implementors?)
> >> But we do assume that routing state, if present in the measuring
> >> network
> >> element, will be available to form the packet reports, so it
> >> should be
> >> feasible to do filtering based on routing state when reports
> >> are formed i.e. after all the other selection primitives have
> >> operated.
> >> Nick
> >>
> >> > Maurizio
> >> >
> > --
> > to unsubscribe send a message to psamp-request@ops.ietf.org with
> > the word 'unsubscribe' in a single line as the message text body.
> > archive: <http://ops.ietf.org/lists/psamp/>
> >
> 
> --
> Dipl.-Ing. Tanja Zseby
> FhI FOKUS/Global Networking                     Email: zseby@fokus.fhg.de
> Kaiserin-Augusta-Allee 31                               Phone: +49-30-3463-7153
> D-10589 Berlin, Germany                         Fax:   +49-30-3463-8153
> --------------------------------------------------------------------------------------
> "Living on earth is expensive but it includes a free trip around the sun." (Anonymous)
> --------------------------------------------------------------------------------------

--
to unsubscribe send a message to psamp-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/psamp/>
Follow-Ups:
- Re: Terminology harmonization: a proposal
  - From: Tanja Zseby <zseby@fokus.gmd.de>
References:
- Terminology harmonization: a proposal
  - From: Maurizio Molina <molina@ccrle.nec.de>
- Re: Terminology harmonization: a proposal
  - From: Nick Duffield <duffield@research.att.com>
- Re: Terminology harmonization: a proposal
  - From: Maurizio Molina <molina@ccrle.nec.de>
- Re: Terminology harmonization: a proposal
  - From: Tanja Zseby <zseby@fokus.gmd.de>
Prev by Date: Re: psamp vocabulary
Next by Date: Re: Terminology harmonization: a proposal
Previous by thread: Re: Terminology harmonization: a proposal
Next by thread: Re: Terminology harmonization: a proposal
Index(es):
- Date
- Thread