[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: psamp vocabulary



Dear All,
the e-mail from Rae (attached below) proposed a classification of selectors
in 3 classes and triggered a lot of discussion on selectors of Class 3.
I'd like on the contrary here to resume the discussion on the other two
types of selectors (not digging into whether to call them samplers or
filters...).
Note that I agree with Nick that reasoning on "classification" per se is
sterile, unless it is useful for the definition of an information model that
univoquely and concisely describe a packet selection method. This mail is
actually finalized to get some feedback for building it.

Rae proposes the following classification:
Class 1 -> selectors that operate directly on some bits of the packet header

Class 2 -> selectors that pertain to a router's reaction to a particular
packet
Basically, I agree with it, with a couple of modifications:

1) Class 1 should be "enlarged" to cover also hash based sampling.
In fact, similarly to Rae's current Class 1 selectors, hash based sampling
takes as an input some bits of the packet (though not only necessarily those
of the header) then it applies a function (simple or complicate) and finally
takes a decision (thoug based on a selection range and not on a boolean
value). Those similarities are more than the differences that hash based
sampling has wrt the other two selector classes (2 & 3).
2) Class 2 should include also source and destination AS .

What distinguishes Class 1 and Class 2  is that Class 1 selection does't
need router information, Class 2 does.
(What distinguish Class 1 + 2 from Class 3 is then that class 1 + 2 doesn't
depend on the temporal or spatial positioning of a packet in a stream, Class
3 does, but I wouldn't like to enter on this discussion here...)

Class 1)
=====
Nick pointed out that to completely describe a hash based selection, you
need to specify
a) The input (bit) range
b) The (hash) function
c) The selection range
IMO the same information is also enough to describe any of the Class 1
selectors that Rae described below:
- All field definition and masks are actually an input bit range definition
- All "xxx  == <value>"  are actually an identity function + a selection
range (<value>, <!value>)
- All "xxx in a certain range" are actually an identity function + a more
elaborate selection range

As how this information can be formally described:

a) The input (bit) range:
Rae proposes to specify  Information a) as a list of
<IP header field & mask>
However, to cover also his example of "TCP port in range..." we need also to
include TCP headers. And then why not UDP? Moreover, if we want to support
hashing, we should also be able to specify part of the payload....
I personally think that a definition like "take this list of bit positions
after the start of the IP header" is the most general and easy one. Why not
to leave the "translation" from a human readable syntax to this list to an
application interpreting human's input?(i.e. why PSAMP should deal with
defining such a syntax?)
b) The function:
It can be NULL (meaning identity function) of the specification of a hashing
function. How and if to refer to a standard set of function should be
further discussed.
c) The selection range:
it can be expressed as a list of intervals within a range
<interv_begin, interv_end>
<interv_begin, interv_end>
....
Once again, I think that the compilation of such a list from the
interpretation of a human readable syntax like "TCP port in [3000,4000]"
(and the high level syntax itself) doesn't need to be standardized.

Class 2) (router state dependent selection)
=====

This is the Rae's proposed list
   - egress/ingress interface this packet is routed to/from == <value>
   - acl violations
   - failed rpf
   - failed RSVP
   - no route
I'd add also
    - Origin AS
    - Destination AS
Any comment whether this list is exhaustive? and what is needed to
univoquely describe each item?

Maurizio

Rae McLellan wrote:

> > If the terminology isn't clear here, do we need to come up with
> > something better? With the words currently at my disposal,
> > my usage is:
> >
> >   1. sampling = 1 in N (periodic or statistical) or hash-based
> >   2. filtering = filtering
> >   3. (primitive) selectors = either 1 or 2, and further methods TBD
> >   4. (composite) selectors = composites of methods from 3
> >
> > So the work of item 1:
> >
> >    "1. Selectors for packet sampling. Define the set of primitive
> >    packet selection operations for network elements, the parameters
> >    by which they may be configured, and the ways in which they can
> >    be combined."
> >
> > is precisely to lay out what these selectors are.
>
> I believe the WG charter's use of "Selectors" and Andy's subsequent
> use of the term (in defining his suggested subdivision of tasks 1&2)
> is strictly generic.  And from the following psamp posts, it looks
> like we can classify the types of selectors into 3 basic groups.
>
> 1) selectors that operate directly on the packet header. i.e. some
>    function applied to the bits of an IP header that returns a
>    boolean select/no-select value.  e.g.:
>    - IP source address == <value>
>    - IP destination address & <mask> == <value>
>    - IP protocol == TCP/UDP/ICMP/etc
>    - TCP source port in_range {3000,4000}
>    - TTL > N, TTL == N, TTL < N
>    - IP_TOS & <mask> == <value>
>    - TCP protocol = SYN
>    - IP ID & <mask> == <value>
>    - IP Checksum & <mask> == <value>
>    - Checksum(IP header w/o TTL) & <mask> == <value>
>    (note that these last 3 can be used to generate an almost uniform
>     sample of the IP packets, yet they're still based on IP header)
>
> 2) selectors that pertain to a router's reaction to a particular packet.
>    - egress/ingress interface this packet is routed to/from == <value>
>    - acl violations
>    - failed rpf
>    - failed RSVP
>    - no route
>
> 3) and finally selectors that bear no relation to either the packet
>    or the router's functionality, such as:
>    - the next K sequential packets after a wait of N packets.
>    - random sampling
>
> All of these selectors operate in the logical space.  They do not
> refer to physical bytes.  i.e. there is no facility for "the Nth byte
> of the IP header == <value>".  Selectors only refer to logical fields.
> Eventually the hardware/software will have to examine and compare bits,
> but the selector specification is defined in the logical space and some
> sort of compiler will translate the filter description into an executing
> rule set in hardware or software.  This process is implementation
> dependent and out of bounds of the specification. (IMHO... :)
> The most general form of a type 1 selector is:
>    ( <packet header field> & <mask> ) == <value>
>
> I'm sure there's many more that can be included in each list.
> But, all three types should be described in the PSAMP document.
>
> > For the discussion on pre-filtering, the phrase "and the ways in
> > which they can be combined" is key. In the framework, filtering is
> > one of several packet selection mechanisms, which may be combined
> > to form composite packet selectors. For example, a composite
> > selector whose first component is a filter and whose second is
> > 1 in N sampling.
>
> If you think of the selectors as building blocks for the eventual
> filter, then its just a matter of combining selectors in conjunctions
> and unions to get a composite rule.
>
> For example maybe I wanted to select packets which were destined for
> the whitehouse in a DDOS attack.  One of the selectors would be,
>   S1 := IP destination address == 63.240.15.146
>   S2 := IP destination address == 63.240.15.154
> So, to cover both addresses requires the union of S1 and S2.
> [I know those two selectors could be a single compare under mask, but
>  humor me for the sake of introducing a union in the example.]
> But then you only need to look at the SYN packets, so the conjunction
> of two more selectors is required.
>   S3 := IP protocol == TCP
>   S4 := TCP_SYNFLAG == 1
> Then the rule for finding DDOS attacks on the whitehouse becomes:
>   (S1 || S2) && S3 && S4.
> But the that turns out to be too much data to analyze and some sort
> of sampling is required to reduce the sample traffic to a acceptable
> rate.  A 5th selector based on the IP header checksum provides a
> reasonably uniform sampling.
>   S5 := IP checksum & <mask> == <value>
> And the final rule becomes:
>   R1 := (S1 || S2) && S3 && S4 && S5
>
> No doubt there would be some limit in any particular implementation
> on the number of selectors and rules that can operate simultaneously.
> But that's an implementation difference I'm sure the marketing
> types will enjoy hyping.
>
> Except for one case, I don't believe applying the selectors in any
> particular order produces different results.  Though early out for
> performance might be something a compiler could achieve, the end
> result of sampled traffic remains the same.  I don't really care
> what syntax is used to specify the selectors or how they are combined.
> But the functionality of union and conjunctions is important.
>
> The only place where the order of the selector rules matters is
> when performing the "sample K packets every N" type selector.
> This is because there is a difference in the range over which
> the N is sampled.  If this type of sampler is applied first, N
> ranges over the entire input stream.  But if it is applied last,
> then N only applies to those packets that have managed to pass
> through any previous selector functions.  Each will produce a
> different sample stream.
>
> On the matter of report contents, I agree with Derek that the
> simpler the better with just forwarding the first N bytes.
> A report should consist of a header followed by a number of sample
> entries.  The sample report header would contain:
>         1) identity of reporting agent
>         2) report sequence number (to detect lost reports)
>         3) agent status flags (total # of samples, alarms, etc)
>
> A fixed sized report sample entry would consist of:
>         1) rule specifier (preferably a rule id, not the full rule)
>         2) timestamp
>         3) first N bytes of IP packet
>
> Whatever PSAMP comes up with, I believe it should be simple enough
> to expect hardware implementations at the higher line rates.
> Both the selection and the report generation processes should
> have minimal overhead to allow implementations at high line rates.
>
> I beg your pardon for being so pedantic.  But, I'm trying to
> to get past the, "Six blind men describing an elephant", stage.
>                         my 2 cents,
>                         Rae McLellan
>
> --
> to unsubscribe send a message to psamp-request@ops.ietf.org with
> the word 'unsubscribe' in a single line as the message text body.
> archive: <http://ops.ietf.org/lists/psamp/>


--
to unsubscribe send a message to psamp-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/psamp/>