[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

new version of sampling draft



Hi all,

I just submitted a new version of the sampling draft. In the new version Maurizio, Fredric and me started the definition of information models for sampling and filtering.
You can find the document attached.

Regards
Tanja

--
Dipl.-Ing. Tanja Zseby
FhI FOKUS/Global Networking Email: zseby@fokus.fhg.de
Kaiserin-Augusta-Allee 31 Phone: +49-30-3463-7153
D-10589 Berlin, Germany Fax: +49-30-3463-8153
-------------------------------------------------------------------------------------- "Living on earth is expensive but it includes a free trip around the sun." (Anonymous)
--------------------------------------------------------------------------------------



Internet Draft
Document: <draft-ietf-psamp-sample-tech-00.txt>                T. Zseby
Expires: April 2003                                    Fraunhofer FOKUS
                                                              M. Molina
                                                        NEC Europe Ltd.
                                                             F. Raspall
                                                        NEC Europe Ltd.

                                                           October 2002


   Sampling and Filtering Techniques for IP Packet Selection


Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that
   other groups may also distribute working documents as Internet-
   Drafts. Internet-Drafts are draft documents valid for a maximum of
   six months and may be updated, replaced, or obsoleted by other
   documents at any time. It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

   This document describes sampling and filtering techniques for IP
   packet selection. It introduces information models for packet
   sampling, for packet filtering and for combinations of methods. The
   information models describe what information has to be specified in
   order to describe the method. This information is used for
   configuring the selection technique in measurement processes and for
   reporting the technique in use to the measurement data collection
   process.
   The document first suggests some terminology, then it describes in
   detail packet sampling and packet filtering techniques and their
   parameters. It also describes how these two techniques can be
   combined to build more elaborate packet selectors. Finally, it
   introduces information models for the most common sampling and
   filtering techniques.




Zseby, Molina, Raspall        Expires April 2003              [Page 1]

Internet Draft  Techniques for IP Packet Selection    August 2002



Table of Contents

   1.    Introduction.................................................2
   2.    Terminology..................................................3
   3.    Scope and Deployment of Packet Selection Techniques..........4
   3.1   Sampling.....................................................5
   3.2   Filtering....................................................6
   4.    Sampling Methods.............................................6
   4.1   Sampling Algorithm...........................................6
   4.1.1 Systematic Sampling..........................................6
   4.1.2 Random Sampling..............................................6
   5.    Sampling Parameters..........................................7
   5.1   Parameters for systematic sampling...........................7
   5.2   Parameters for random sampling...............................7
   6.    Information Model Sampling Techniques........................8
   7.    Filtering....................................................9
   7.1   Filtering operating directly on some of the packet’s bits....9
   7.2   Filtering considering router reaction or router state.......10
   8.    Information Model for Filtering Techniques..................10
   9.    Composite Techniques........................................12
   9.1   Cascaded filtering->sampling or sampling->filtering.........13
   9.2   Stratified Sampling.........................................13
   10.   Security Considerations.....................................14
   11.   References..................................................14
   12.   Author's Addresses..........................................15
   13.   Full Copyright Statement....................................15

1. Introduction

   Increasing data rates and growing measurement demands increase the
   requirements for data collection resources. For measurement
   scenarios in backbone networks it is often required to measure whole
   traffic aggregates instead of single flows. Furthermore some
   measurement methods require the capturing of packet headers or even
   parts of the payload. All this can lead to an overwhelming amount of
   measurement data, resulting in high demands regarding resources for
   metering, storage, transport and post processing.

   In some cases specialized hardware helps to fulfill these demands
   but on the other hand increases the costs for providing the
   measurement. Since measurements are mainly a supporting
   functionality for the service provisioning, measurement costs
   usually should be limited to a small fraction of the costs of the
   network service provisioning itself. Therefore a reduction of the
   measurement result data is crucial to prevent the depletion of the
   available (i.e. the affordable) resources.  Such a reduction can be
   achieved by a reasonable deployment of packet selection techniques,
   that sample a subset of the packets while still allowing reliable
   measurements, or filter out all the packet not interesting for the
   measurement itself. Packet selection  helps to prevent an exhaustion
   of resources and to limit the measurement costs. Examples for


Zseby, Molina, Raspall        Expires April 2003              [Page 2]

Internet Draft  Techniques for IP Packet Selection    August 2002


   applications that benefit from packet selection are given in
   [DuGG02].

2. Terminology

   IP Packet Selection Process
        An IP packet selection process takes IP packets or parts of IP
        packets (e.g. header) as input and extracts a subset of these
        packets by applying a selection function.

   Filtering
        Filtering selects a subset of packets by applying deterministic
        functions on parts of the packet content like header fields or
        parts of the payload. Filtering techniques can also be used to
        emulate a (pseudo)random selection of packets with a given
        probability p. A filtering process needs to process the packet
        (look at packet header and/or payload) in order to make the
        selection decision.

   Sampling
        Sampling selects a subset of packets by applying deterministic
        or random functions on the (temporal or spatial) packet
        position or by performing (pseudo)random calculations per
        packet. This can be for example selecting every nth packet
        (deterministic function on packet position)or selecting a
        packet that arrived at the metering process in accordance to
        the output of a random function (like flipping a coin per
        packet). Sampling does not work on packet content. That means,
        in contrast to filtering, a sampling process does not need to
        process the packet in order to make the selection decision.

  Hash function
        The computation of an M bit string starting from an N bit
        string. In this context, the N starting bits are some of the
        bits of a packet header and/or payload.

  Hash selection range
        A subset of the M bit computed with a hash function for which
        an Indicator Function has a value of 1.

   Stream
        A packet stream is the sequence of packets used as input for a
        packet selection process. If multiple packet selectors are
        applied subsequently, the output stream of one selector forms
        the input stream for the succeeding selector. If the first
        selector was a sampling process, the packets in the stream
        usually do not have common properties by which they can be
        distinguished from the not selected packets. Therefore we
        define here the term stream instead of flow, which is defined
        as set of packets with common properties [QuZC02]. If the term
        flow is used throughout the text, the flow definition in
        [QuZC02] applies.


Zseby, Molina, Raspall        Expires April 2003              [Page 3]

Internet Draft  Techniques for IP Packet Selection    August 2002


   Metering process
        see definition in [QuZC02]

   Sample size
        The sample size denotes the number of packets in the sample.

   Selection function
        Function that determines whether an IP packet is selected or
        not.

   Sampling probability
        The probability with which one element is selected as part of
        the sample.

   Sampling ratio
        The ratio between the sample size and the number of packets
        composing the input stream of a packet sampling process.

3. Scope and Deployment of Packet Selection Techniques

   The selection technique used to select a subset of packets out of
   all those crossing an observation point depends on the purpose
   (application) for which measurement is performed. If the main
   purpose of an application is to infer some characteristic of the
   whole set of crossing packets without processing them all (thus
   reducing the computation load) then we call the used selection
   technique “sampling”. In principle, with sampling the content of the
   packet is not relevant for the packet selection: what matters is
   only that the selected sample has a distribution of the
   characteristic to infer similar to the one of the parent population,
   so that it can be estimated reliably. The sampling decision may be
   based on the temporal or spatial position of the packet in the
   packet stream, or may depend on a (pseudo)random number extraction
   or calculation.

   On the contrary, if the application needs to consider all the
   packets having some common property, then we call the selection
   technique “filtering”. The property can be directly derived by some
   computation on the packet content, or depend on the treatment given
   by the router to the packet. We conclude that sampling does not
   consider packet content, and can depend on packet position or on
   (pseudo)random decisions, while filtering depends on packet content,
   but never depends on packet position or on (pseudo)random decisions.

   Note that a common technique to select packets is to compute a hash
   function on some bits of the packet header and/or content and to
   select it if the result falls in a certain selection range. Since
   hashing is a deterministic operation, it is a powerful mean to
   ensure that the same packets are selected at multiple measurement
   points. Depending on the chosen input bits, on the hash function and
   on the selection range, this technique could also be used to emulate
   the random selection of packets with a given probability p. However,


Zseby, Molina, Raspall        Expires April 2003              [Page 4]

Internet Draft  Techniques for IP Packet Selection    August 2002


   due to its intrinsically deterministic nature, we prefer to classify
   hashing as a filtering technique.

   The introduced classification is mainly useful for the definition of
   an information model describing “primitive” selection techniques.
   More complex selection techniques may then be described through the
   composition of cascaded sampling and filtering operations.
   For example, a packet selection that weights the selection
   probability on the basis of the packet length can be described as a
   set of filter/sampling cascades. However, this descriptive approach
   is not intended to be rigid: if a common and consolidated selection
   practice  turns out to be too complex to be described as a
   composition of the mentioned building blocks, an ad hoc description
   can be specified instead.

   We consider packet selectors as part of an IPFIX metering process
   which also can use the IPFIX exporting process. This is expressed as
   association to one or more IPFIX processes.

3.1 Sampling

   The deployment of sampling techniques aims at the provisioning of
   information about a specific characteristic of the parent population
   at a lower cost than a full census would demand. In order to plan a
   suitable sampling strategy it is therefore crucial to determine the
   needed type of information and the desired degree of accuracy in
   advance.

   First of all it is important to know the type of metric that should
   be estimated. The metric of interest can range from simple packet
   counts [JePP92] up to the estimation of whole distributions of flow
   characteristics (e.g. packet sizes)[ClPB93].

   Secondly, the required accuracy of the information and with this,
   the confidence that is aimed at, should be known in advance. For
   instance for usage-based accounting the required confidence for the
   estimation of packet counters can depend on the monetary value that
   corresponds to the transfer of one packet. That means that a higher
   confidence could be required for expensive packet flows (e.g.
   premium IP service) than for cheaper flows (e.g. best effort). The
   accuracy requirements for validating a previously agreed quality can
   also vary extremely with the customer demands. These requirements
   are usually determined by the service level agreement (SLA).

   Sampling is considered as part of the metering process. It can be
   applied at different functions of the metering process. In the
   following we consider a measured IP
   packet with its observation point and timestamp as basis elements of
   the parent population.

   The sampling method and the parameters in use must be clearly
   communicated to all applications that use the measurement data. Only


Zseby, Molina, Raspall        Expires April 2003              [Page 5]

Internet Draft  Techniques for IP Packet Selection    August 2002


   with this a correct interpretation of the measurement results  can
   be ensured.

3.2 Filtering

   Packet filtering can be done for a wide variety of purposes e.g. for
   security, SLA enforcing, accounting. Depending on the type of
   filtering, it can be applied in different parts of the metering
   process. The role of filtering, as the word itself suggest, is to
   separate all the packets having a certain property from those not
   having it. A distinguishing characteristic from sampling is that the
   property never depends on the packet position in time or in the
   space, or on a random process. However, hash based filtering may be
   used to emulate a random selection.

4. Sampling Methods

   Sampling Methods can be characterized by the sampling algorithm, the
   trigger type used for starting a sampling interval and the length of
   the sampling interval. These parameters are described here in
   detail.

4.1 Sampling Algorithm

   The sampling algorithm describes the basic process for selection of
   samples. In accordance to [AmCa89] and [ClPB93] we define the
   following basic sampling processes:

4.1.1   Systematic Sampling

   Systematic sampling describes the process of selecting the starting
   points and the duration of the selection intervals according to a
   deterministic function. This can be for instance the periodic
   selection of every n-th element of a trace but also the selection of
   all packets that arrive at pre-defined points in time. Even if the
   selection process does not follow a periodic function (e.g. if the
   time between the sampling intervals varies over time) we consider
   this as systematic sampling as long as the selection is
   deterministic. The use of systematic sampling always involves the
   risk of biasing the results. If the systematics in the sampling
   process resembles systematics in the observed stochastic process
   (occurrence of the characteristic of interest in the network), there
   is a high probability that the estimation will be biased.
   Systematics (e.g. periodic repetition of an event) in the observed
   process might not be known of in advance.

4.1.2   Random Sampling

   Random sampling selects the starting points of the sampling
   intervals in accordance to a random process. The selection of
   elements are independent experiments. With this, unbiased
   estimations can be achieved. In contrast to systematic sampling,


Zseby, Molina, Raspall        Expires April 2003              [Page 6]

Internet Draft  Techniques for IP Packet Selection    August 2002


   random sampling requires the generation of random numbers. One can
   differentiate two methods of random sampling:

   n-out-of-N sampling

   In n-out-of-N sampling n elements are selected out of the parent
   population that consists of N elements. One example would be to
   generate random numbers and select all packets which have a packet
   position equal to one of the random numbers. For this kind of
   sampling the sample size is fixed.

   Probabilistic sampling (see also [DuGG02])

   In probabilistic sampling the decision whether an element is
   selected or not is made in accordance to a pre-defined selection
   probability. An example would be to flip a coin for each packet and
   select all packets for which the coin showed the head. For this kind
   of sampling the sample size can vary for different trials. The
   selection probability is not necessarily the same for each packet.

5. Sampling Parameters

   The decision whether to select a packet or not is based on a
   function which is performed when the packet arrives at the sampling
   process. The sampling function can consist of a (pseudo)random
   calculation or of a function that take the packet position(temporal
   or spatial) into account. The parameters of these functions that are
   not derived from the packet are called sampling parameters.

5.1 Parameters for systematic sampling

   For systematic sampling the deterministic function which is used for
   the packet selection needs to be given. For periodic sampling the
   start of the first selection interval, the length of the selection
   interval (given in number of packets or as time duration) and the
   spacing between selection intervals needs to be specified.

                   <-- interval length = 7 --> <-- spacing = 5 _->
   Packet position: 1   2   3   4   5   6   7   8   9  10   11  12 13..

   The packets in the sample will be: 1,2,3,4,5,6,7, 13,...

   Selecting every x-th packet would be a special case with selection
   interval=1 and spacing=x-1.

5.2 Parameters for random sampling

   For random n-out-of-N sampling only the sample size n needs to be
   specified. This can be done either as an absolute number or as
   fraction of the parent population n/N.

   For probabilistic sampling the selection probability p needs to be
   specified. If the selection probability depends on other parameters

Zseby, Molina, Raspall        Expires April 2003              [Page 7]

Internet Draft  Techniques for IP Packet Selection    August 2002


   (e.g. packet content), the function that expresses this dependency
   has to be specified.
6. Information Model Sampling Techniques

   In this section we define the information models for most common
   sampling techniques. Here the selection function is pre-defined and
   given by the selector ID.

   Sampler Description:
        SELECTOR_ID
        SELECTOR_TYPE
        SELECTOR_PARAMETERS
        OPERATING_TIME
        ASSOCIATIONS

   Where:

   SELECTOR_ID:
   Unique ID for the packet sampler. The ID can be calculated under
   consideration of the ASSOCIATIONS and a local ID.


   SELECTOR_TYPE
   Description: For sampling processes the SELECTOR TYPE defines what
   sampling algorithm is used.
   Values: n out of N | Systematic Time Based (equally spaced)|
   Systematic Position Based (equally spaced)| Probabilistic

   [Remark: further sampling schemes will be added here]

   SELECTOR_PARAMETERS
   Description: For sampling processes the SELECTOR PARAMETERS define
   the input parameters for the process. Interval length in systematic
   sampling means, that all packets that arrive in this interval are
   selected. The spacing parameter defines the spacing in time or
   number of packets between the end of one sampling interval and the
   start of the next succeeding interval.

   Case n out of N
      - List of n sampling positions in an array of N positions

   Case STB:
      - Interval length (in usec), Spacing (in usec)

   Case SPB
      - Interval length(in packets), Spacing (in packets)

   Case Probabilistic(with equal probability per packet)
      - Sampling probability p

   OPERATING_TIME
   Description: The OPERATING_TIME parameter describes the start/stop
   time of sampling process. List elements must not overlap. The start

Zseby, Molina, Raspall        Expires April 2003              [Page 8]

Internet Draft  Techniques for IP Packet Selection    August 2002


   time of the first element can be omitted, meaning “from now”. The
   end time of the last element can be omitted, meaning “until sampler
   is removed”.
   Values: List of (Start time, End time)

   ASSOCIATIONS
   Description: The ASSOCIATIONS field describes the observation point
   and the IPFIX processes to which the packet selector is associated.
   The STREAM ID denotes the origin of the data stream that is input to
   the selection function. It can be the observation point directly or
   the ID of another selector. With this it is possible to define
   combined schemes. If the STREAM ID contains IDs from other
   selectors, one can derive the original observation point from the
   selector definitions of these specified selectors.

   Values: <STREAM ID, Metering process ID, Exporting process ID>
   With STREAM ID: Observation point ID | List of SELECTOR_IDs


7. Filtering

   As pointed out in section 3, the main difference between sampling
   and filtering is that filtering never depends on the temporal or
   spatial position of packets. We introduce two classes of filters. In
   the first one, the property can be directly derived by applying a
   function on some bits of the packet, while in the second one the
   property depends on router state or on the router’s reaction to a
   particular packet.
   The filters of the first class should be able to operate at full
   line rate, while some of the ones of the second may need to be
   preceded by a sampling function (e.g. because they involve access to
   router state).

   [Discussion needed on router-state based filtering]

7.1 Filtering operating directly on some of the packet’s bits

   These filters functionally operate as follow:

      - They select some bits of the packet (not or not only
        necessarily those of the header).
      - They apply a function on the selected bits. The function can be
        as simple as the identity function (i.e. this step is logically
        skipped), or as complex as a hash function.
      - They feed the result of the function into an indicator
        function, that returns a “select/do not select” result.

   Examples of filters of this class are filters that select packets on
   the basis of the matching of some of the header fields with a
   (possibly masked) pre defined value, filters that select the packets
   that have some header field value falling within a predefined range,
   or filters that select some header fields and/or a portion of the
   payload, apply a hash function and then select the packet if the

Zseby, Molina, Raspall        Expires April 2003              [Page 9]

Internet Draft  Techniques for IP Packet Selection    August 2002


   results is in the hash selection range. Note that in the latter
   case, the selected bits may not be the only one forming the input of
   the hash function. For example, a “secret” bit sequence could be
   appended to the selected bits in order to make it harder for an
   attacker to forge packets being either always or never selected.

  An implementation isn’t constrained to apply exactly all these steps
  or in this sequence, provided that the result is equivalent to a
  logical function doing it.

7.2 Filtering considering router reaction or router state

   This class of filters select a packet on the basis of the following
   conditions), possibly combined with the AND, OR or NOT operators.


      - Ingress interface it is coming from equal to a specified value
      - Egress interface it is routed to equal to a specified value
      - Packet violated acl on the router
      - Failed rpf
      - Failed rsvp
      - No route found for the packet
      - Origin AS equal to a specified value or within a specified
        range
      - Destination AS equal to a specified value or within a specified
        range

8. Information Model for Filtering Techniques

   In this section we define the information models for most common
   filtering techniques. The information model structure closely
   parallels the one presented for the sampling techniques.

   Filter Description:
        SELECTOR_ID
        SELECTOR_TYPE
        SELECTOR_PARAMETERS
        OPERATING_TIME
        ASSOCIATIONS

   Where:

   SELECTOR_ID:
   Unique ID for the packet filter. The ID can be calculated under
   consideration of the ASSOCIATIONS and a local ID.

   SELECTOR_TYPE
   Description: For filtering processes the SELECTOR TYPE defines what
   filtering type is used.
   Values: Matching | Hashing | Router_state

   SELECTOR_PARAMETERS


Zseby, Molina, Raspall        Expires April 2003             [Page 10]

Internet Draft  Techniques for IP Packet Selection    August 2002


   Description: For filtering processes the SELECTOR PARAMETERS define
   formally the common property of the packet being filtered. For the
   filters of type Matching and Hashing the definitions have a lot of
   points in common.
   Values:
   Case Matching
      - <Header type = ip v4 >
      - <bit specification, header part>
      - <Selection interval specification, header part>
      - <Header type = ipv6>
      - <bit specification, header part>
      - <Selection interval specification, header part>
      - <payload byte number N>
      - <bit specification, payload part>
      - <Selection interval specification, payload part>

   Notes to Case Matching:

      - The filter can be defined for the header part only, for the
        payload part only or for both. In the latter case the matching
        must be an AND of the two.
      - The bit specification, for the header part, can be specified
        for ipv4 or ipv6 only, or both
      - In case of ipv4, the bit specification is a sequence of 20
        Hexadecimal numbers [00,FF] specifying a 20 bytes bitmask to be
        applied to the header
      - In case of ipv6, it is a sequence of 40 Hexadecimal numbers
        [00,FF] specifying a 40 bytes bitmask to be applied to the
        header
      - The bit specification, for the payload part, is a sequence of
        Hexadecimal numbers [00,FF] specifying the bitmask to be
        applied to the first N bytes of the payload, as specified by
        the previous field. In case the Hexadecimal number sequence is
        longer then N, only the first N numbers are considered.
      - In case the payload is shorter than N, the packet will not
        match the filter Other options, like padding with zeros, may be
        considered in the future.
      - The selection interval specification is a list of non
        overlapping intervals [intv_begin, intv_end] where intv_begin,
        intv_end are bit strings of length 20*8 (ipv4 case), 40*8 (ipv6
        case), N*8 (payload case).
      - A filter cannot be defined on the options field of the ipv4
        header, neither on stacked headers of ipv6.
      - This specification doesn’t preclude the future definition of a
        high level syntax for defining in a concise way bit selection
        and matching rules in a more human readable form (e.g. “TCP
        port in [2000,3000]”). The requirement is that such a syntax
        can be univoquely compiled into the one defined above

   Case Hashing:
      - <Header type = ipv4>
      - <Input bit specification, header part>
      - <Header type =  ipv6>

Zseby, Molina, Raspall        Expires April 2003             [Page 11]

Internet Draft  Techniques for IP Packet Selection    August 2002


      - <Input bit specification, header part>
      - <payload byte number N>
      - <Input bit specification, payload part>
      - <additional hash input bits (seed)>
      - Hashing function specification (includes length of hash
        function output M)
      - Selection interval specification, as a list of non overlapping
        intervals [start value, end value] where value is in [0,2^M-1]

   Notes to Case Hashing:

      - On Input bit specifications fields, the same notes on bit
        specifications of the Matching case reported above apply

   Case Router State:
      - Ingress interface it is coming from equal to a specified value
      - Egress interface it is routed to equal to a specified value
      - Packet violated acl on the router
      - Failed rpf
      - Failed rsvp
      - No route found for the packet
      - Origin AS equal to a specified value or within a specified
        range
      - Destination AS equal to a specified value or within a specified
        range

   Note to Case Router State:
      - All Router state entries can be linked by AND, OR, NOT
        operators

   OPERATING_TIME
   Description: The OPERATING_TIME parameter describes the start/stop
   time of filtering process. List elements must not overlap. The start
   time of the first element can be omitted, meaning “from now”. The
   end time of the last element can be omitted, meaning “until sampler
   is removed”.
   Values: List of (Start time, End time)

   ASSOCIATIONS
   Description: The ASSOCIATIONS field describes the observation point
   and the IPFIX processes to which the packet selector is associated.
   The STREAM ID denotes the origin of the data stream that is input to
   the selection function. It can be the observation point directly or
   the ID of another selector. With this it is possible to define
   combined schemes. If the STREAM ID contains IDs from other
   selectors, one can derive the original observation point from the
   selector definitions of these specified selectors.

   Values: STREAM ID, Metering process ID, Exporting process ID>
   With STREAM ID: Observation point ID | List of SELECTOR_IDs

9. Composite Techniques


Zseby, Molina, Raspall        Expires April 2003             [Page 12]

Internet Draft  Techniques for IP Packet Selection    August 2002


   Composite schemes are realized by using the STREAM ID in the
   information models. The STREAM ID denotes from which selectors the
   input stream originates. If multiple stream IDs are given, this
   means that the selector operates on the packet stream simply
   resulting from the time superposition of the output of all the
   listed filters and samplers. Note that a sampler/filter could be
   intermittently active, as defined in the OPERATING TIME field.
   Some examples of composite schemes are reported below.

9.1 Cascaded filtering->sampling or sampling->filtering

   If a filter precedes a sampling process the role of filtering is to
   create a set of “parent populations” from a single stream that can
   then be fed independently to different sampling functions, with
   different parameters tuned for the population itself (e.g. if
   streams of different intensity result from filtering, it may be good
   to have different sampling rates). If filtering follows a sampling
   process, the same sampling rate and type is applied to the whole
   stream, independently of the relative size of the streams resulting
   from the filtering function. Moreover, also packets not destined to
   be selected will “load” the sampling function. So, in principle,
   filtering before sampling allows a more accurate tuning of the
   sampling procedure, but if filters are too complex to work at full
   line rate (e.g. because they have to access router state
   information), sampling before filtering may be a need.

9.2 Stratified Sampling

   Stratified sampling is one example for using a composite technique.
   The basic idea behind stratified sampling is to increase the
   estimation accuracy by using a-priori information. The a-priori
   information is used to perform an intelligent grouping of the
   elements of the parent population. With this a higher estimation
   accuracy can be achieved with the same sample size.

   Stratified sampling divides the sampling process into multiple
   steps. First, the elements of the parent population are grouped into
   subsets in accordance to a given characteristic. This grouping can
   be done in multiple steps. Then samples are taken from each subset.

   The stronger the correlation between the characteristic used to
   divide the parent population and the characteristic of interest (for
   which an estimate is sought after), the easier is the consecutive
   sampling process and the higher is the stratification gain. For
   instance if the dividing characteristic were equal to the
   investigated characteristic, each element of the sub-group would be
   a perfect representative of that characteristic. In this case it
   would be sufficient to take one arbitrary element out of each
   subgroup to get the actual distribution of the characteristic in the
   parent population. Therefore stratified sampling can reduce the
   costs for the sampling process (i.e. the number of samples needed to
   achieve a given level of confidence).


Zseby, Molina, Raspall        Expires April 2003             [Page 13]

Internet Draft  Techniques for IP Packet Selection    August 2002


   For stratified sampling one has to specify classification rules for
   grouping the elements into subgroups and the sampling scheme that is
   used within the subgroups. The classification rules can be expressed
   by multiple filters. For the sampling scheme within the subgroups
   the parameters have to be specified as described above.

10. Security Considerations

   Security threats can occur if the configuration of sampling
   parameters or the communication of sampling parameters to the
   application is corrupted. This document only describes sampling
   schemes that can be used for packet selection. It neither describes
   a mechanism how those parameters are configured nor how these
   parameters are communicated to the application. Therefore the
   security threats that originate from this kind of communication
   cannot be assessed with the information given in this document.

   In some cases malicious users or attackers may be interested to hide
   packets from the service provider. For instance if packet selectors
   are used for accounting or intrusion detection applications, users
   may want to prevent that packets are selected. If a deterministic
   sampling scheme is used or a selection scheme that takes packet
   content into account, the user can shape or send packets in a way
   that they are less likely to be selected. This has to be taken into
   account when choosing an appropriate packet selection technique.

11. References

   [AmCa89]    Paul D. Amer, Lillian N. Cassel: Management of Sampled
               Real-Time Network Measurements, 14th Conference on Local
               Computer Networks, October 1989, Minneapolis, pages 62-
               68, IEEE, 1989

   [ClPB93]    K.C. Claffy, George C Polyzos, Hans-Werner Braun:
               Application of Sampling Methodologies to Network Traffic
               Characterization, Proceedings of ACM SIGCOMM'93, San
               Francisco, CA, USA, September 13 - 17, 1993

   [CoGi98]    I. Cozzani, S. Giordano: Traffic Sampling Methods for
               end-to-end QoS Evaluation in Large Heterogeneous
               Networks. Computer Networks and ISDN Systems, 30 (16-
               18), September 1998.

   [DuGG02]    Nick Duffield, Albert Greenberg, Matthias Grossglauser,
               Jennifer Rexford: A Framework for Passive Packet
               Measurement, Internet Draft draft-duffield-framework-
               papame-01, work in progress, February 2002

   [DuGr00]    Nick Duffield, Matthias Grossglauser: Trajectory
               Sampling for Direct Traffic Observation, Proceedings of
               ACM SIGCOMM 2000, Stockholm, Sweden, August 28 -
               September 1, 2000.


Zseby, Molina, Raspall        Expires April 2003             [Page 14]

Internet Draft  Techniques for IP Packet Selection    August 2002


   [JePP92]    Jonathan Jedwab, Peter Phaal, Bob Pinna: Traffic
               Estimation for the Largest Sources on a Network, Using
               Packet Sampling with Limited Storage, HP technical
               report, Managemenr, Mathematics and Security Department,
               HP Laboratories, Bristol, March 1992,
               http://www.hpl.hp.com/techreports/92/HPL-92-35.html

   [QuZC02]    J. Quittek, T. Zseby, B. Claise, S. Zander, G. Carle,
               K.C. Norseth: Requirements for IP Flow Information
               Export, Internet Draft <draft-ietf-ipfix-reqs-05.txt>,
               work in progress, August 2002

   [Zseb02]    Tanja Zseby: Deployment of Sampling Methods for SLA
               Validation with Non-Intrusive Measurements, Proceedings
               of Passive and Active Measurement Workshop (PAM 2002),
               Fort Collins, CO, USA, March 25-26, 2002

12. Author's Addresses

   Tanja Zseby
   Fraunhofer Institute for Open Communication Systems
   Kaiserin-Augusta-Allee 31
   10589 Berlin
   Germany
   Phone: +49-30-34 63 7153
   Fax:   +49-30-34 53 8153
   Email: zseby@fokus.fhg.de

   Maurizio Molina
   NEC Europe Ltd., Network Laboratories
   Adenauerplatz 6
   69115 Heidelberg
   Germany
   Phone: +49 6221 90511-18
   EMail: molina@ccrle.nec.de

   Fredric Raspall
   NEC Europe Ltd., Network Laboratories
   Adenauerplatz 6
   69115 Heidelberg
   Germany
   Phone: +49 6221 90511-31
   EMail: raspall@ccrle.nec.de

13. Full Copyright Statement

   Copyright (C) The Internet Society (2002). All Rights Reserved. This
   document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph
   are included on all such copies and derivative works. However, this

Zseby, Molina, Raspall        Expires April 2003             [Page 15]

Internet Draft  Techniques for IP Packet Selection    August 2002


   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.





































Zseby, Molina, Raspall        Expires April 2003             [Page 16]