[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
draft-ietf-tewg-measure-05.txt comments
At a high level the document is broken up into the following outline
1-3 Introduction
4 Definitions
5 Rational / Uses
6 Time scales
7 Readout / sampling / summarization
8 Bases (e.g. node, link, path, node-pair)
9 Entities (e.g. traffic volume, delay, ...)
10 Types (Permute Bases x Entities - define what's valid)
11-14 Some blah blah (see specific comments too :)
15 Recommendations and Conclusion
At a high level I think we can ditch section 4 and move these into section
8 and 9 as appropriate. Sections 10-14 are interesting, but I don't think
they are directly necessary in specifying recommendations on *measurement*
(maybe what to do with the measurements, but no bearing on the measurement
itself). Section 10 or 15 would be good places for the CRISP
recommendations.
Here are general comments:
- there is no discussion of the form of a measurement, e.g. should traffic
volume be accumulated or a rate, should time (E.g. delay) be ms, ns, or
seconds. These twists could lead to inconsistency - where is that to be
defined?
- "hold-time" of an LSP (aka uptime). What about an RSVP session with
signalled make-before-break? If it were initially signalled at t=0,
and then at t=5 it changed it's path, and let's say at t=10 it
increased it's bandwidth, if at t=15 we ask how long it's been up
what is the answer?
- also t=15, is that minutes - or hours? or do we not care as long
as it is specified :)
- Statistical measures such as "variance" and other second order
measurements. I don't think this is a raw measurement as much as a
calculated value. Should nodes calculate this? Over how many
samples? Why not just calculate offline based on the raw
measurement?
- maybe expand on what you mean by per service class
- Flow measurement is a deep-hole, we should either defer to other
standardization efforts on this, or we need to be a lot more
concrete on what we need to know here.
- Node Pair -v- Path. The Node-pair could replicate a lot of
information that is available on a path basis. Do we want that?
The advantage might be some persistence in the measurement. I think
every router should have a counter for bytes switched to bgp
nexthops, maybe something similar to a Node's known MPLS
destinations (e.g. egresses for *mine* LSPs)
Here are specific comments:
Section 3, first sentence - the goal is not to have a "framework", the
goal is to foster consistent measurements across implementations for
traffic engineering purposes.
"To achieve multi-vendor interoperability..." Not sure how
measurements on different systems can not be interoperable, maybe
multi-vendor consistency.
"Other principles such as concise reprensentation" - we should be
focussing on more than principles. Maybe we should identify
guidelines for measurements. E.g. not that we should have "accurate"
measurements, or even "traffic volume" -as much as that volume should
be represented in accumulated bytes.
"average hold-time" - you can only measure hold-time (more precisely
up-time of an LSP or Path). You can calculate average hold-time
offline. Are you suggesting that it is important that the node also
calculate this value?
Througput in section 4.3 suggest a possible sustainable rate, e.g. the
throughput of an OC192 is 10 Gbs (or something close to that), in
section 9.1 it looks like it is the amount of traffic which "passes".
I think 4.3 is more correct, is througput actually needed in 9.1?
Section 5.1 second paragraph (the one that is not bulletized). I see
no value in this and suggest striking it.
Section 5.3.... hmmm... Where is this applied?
Section 7.1 (data reduction), are we saying we want the node to store
data for periodic retrieval (e.g record retrieval) - or do we prefer
near real time polling techniques like SNMP (or both?). Unless we are
saying exactly where we want record retrieval capability, I suggest
removing section 7.1
Similar argument on section 7.3 (summarization) - unless we say what
we want summarized on the node, and how, I suggest removing section
7.3
Ditto on section 7.4 (sampling). Unless we say what should be sampled
how, why discuss?
Section 8.2 (interface / link base), on bundled links. Bundled links
are links, so why call them out as special unless we have some
concrete way to handle them. For instance do we want to somehow
stipulate that there should be measurement visibility on the bundle
and the component links, that they should somehow be tied together?
if so, what is the recommendation?
Section 8.4 (path-based base). In first sentence, is it the
"route-pinning" that gives MPLS the means to develop path-based
measurement. It think it's more the ability to tie an edge-to-edge
FEC into a specific LSP and then to have that FEC not visible (or
tunnelled) through transit nodes. Thus ingress, transit and egress
nodes have the ability to distinquish and count on a per macro-flow
basis, and they all know what role they are relative to a particular LSP.
Section 9.1 (entities). It may be best to move definities into here.
Currently you have "entitied, measurment unit class" and a bunch of
notes. Some concise definitions might firm these up.
Delay - what if a node can not truly measure delay? Should we say
there needs to be a way to state this? Do we recommend active
measurement devices for this?
Packet Loss - It is said that it should be monitored, but no where
have we stated that we want to monitor the offered load, the accepted
load and the delivered load. Or where we measure this (e.g if we
measure delivered load at the head-end, that implies some way to
propogate the information). If we have accepted load at head-end and
delivered load at tail-end we can infer the packet loss - is this the
approach? Are we missing a policed load on an LSP at the head end
(does anyone care?)
Section 9.2 - "To characterize paths ... the following entities may
possibly be dfined" (either say they need definition, or don't mention
them.
- path setup / release delay - an interesting measurement, currently
not readilly available.
- path setup denial / error / etc probability - an offline calculation, so
nix?
- path restoration time - what about FRR? would this be measured at
transit node? communicated to head-end?
At a node base, you may want to track setup attempts, failures,
preempted sessions, optimization checks, maybe even average LSP
uptime...
Section 10.1 Types.
No "X" on Interface / Throughput, should there be?
For Delay of a node-pair, what if you have multiple paths with
different delays. Maybe "X" on node-pair delay is not good, as it can
be readily calculated from path delays for that node-pair.
Section 10.2 - does anyone really care how much control traffic is
consuming on their network. For BGP traffic, was the intent the
sourced or anything that transits the node/link (and distinquish
in-my-net -v- across-my-net?) I suggest removing section 10.2 unless
folks care and can show where it is applied in a recommendation
(e.g. a type).
Section 10.2 through 14 was interesting to read, but did not provide
any direct bearing on what measurements are needed. I suggest that
they be removed.
Section 15 -
"a standardized mechanism to detect ... label binding changes for LDP
..." Why?
"Need for uniform measurement definitions across vendors and
operators" - that's the crux! that should be in the first sentance of
section 3 :)
" Need for higher order statistics... " push it to the offline hosts
"Need for packet-sampled..." "Need for offline bulk file transfer..."
The need needs to be better justified or removed.