[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
my answers to the questions
Q A. Definitions
Q
Q 1. In determining the specific requirements, the design team should
Q precisely define the concepts "survivability", "restoration",
Q "protection", "protection switching", "recovery", "re-routing"
Q etc. and their relations. This would enable the requirements doc to
Q describe precisely which of these will be addressed.
Q
Q In the following, the term "restoration" is used to indicate the broad
Q set of policies and mechanisms used to ensure survivability.
I had thought that the definitions were already well defined in the
variety of drafts, especially
draft-ietf-mpls-recovery-frmwrk-02.txt. While the recovery-frmwrk
draft does have a definitions section, it doesn't cover all the
above, and probably an expanded definitions list is needed. In
looking at some different drafts, I did see some arbitrary
distinctions in terminology that might be confusing (e.g. that of
protection switching = reserved bw and restoration = shared bw in
draft-lang-ccamp-recovery-00.txt).
So I propose we take what is in the recovery-frmwrk draft as a
base and expand/refine out a definitions list.
Q
Q B. Network types and protection modes
Q
Q 1. What is the scope of the requirements with regard to the types
Q of networks covered? Specifically, are the following in scope:
Q
Q - Restoration of connections in mesh optical networks
Q (opaque or transparent)
No. Maybe I'm not exactly clear on what a mesh optical network is,
but if it's one with minimal wavelength conversion, and one in
which LSP connections are likely to be very tied to physics and
implementation, I suspect that (1) these networks should benefit
from extensibility of protocols that will first see production in a
more OEO based TDM network and (2) interoperability in this realm
will be challenging at best.
Q - Restoration of connections in hybrid mesh-ring networks
In what? Is this something real - I've heard the term come up a
lot, is it coming from a broad base of supporters?
Q - Restoration of LSPs in MPLS networks (composed of LSRs overlaid on a
Q transport network, e.g., optical)
In packet networks? Yes. I think this is probably the most
imperative area to foster a small set of interoperable
approaches. For one, we more or less have interoperable
implementations of IGPs and signalling, so a nice solid base to
extend into having interoperable/common restoration approaches.
Q - Any other types of networks?
TDM networks. Things full of DCS and ADMs operating at
narrowband, using TDM for muxing. This would be the next area I'd
place importance on.
Q - Is commonality of approach, or optimization of approach more
Q important?
Hmmm... what was I thinking with this question? I suppose I was
wondering if the same signalling base/approach would be used for
both 1:1 path as well as 1+1 path, as well as 1:N link
protection. I was also wondering if media specific optimizations
(such as interpretation of path ais in lieu of explicit
signalling) was good. Thinking more about it though, it's likely
that no-one would be interested in a signalling approach that
could be potentially construed in a variety of ways (depending on
configuration along the way), so some amount of information is
going to be necessary in the signalling message.
I think we should hazard that in principle only the minimal amount
of information should be transported, e.g. err towards too sparse
as opposed to verbosity. reason would be scalability/state.
Q
Q 2. What are the requirements with regard to
Q the protection modes to be supported in each network type covered?
Q (Examples of protection modes include 1+1, M:N, shared mesh,
Q UPSR, BLSR, newly defined modes such as P-cycles, etc.)
Data:
1:1 path, some form of notification to affect a switch. Some
distinction between W/P LSP ok (e.g. different class/priority).
Need to allow head end to create paths which are as failure
disjoint as possible from each other. Need SRGs which can be
assigned to either nodes or links between nodes. There is an
issue of whether protect bandwidth is shared or not. Easiest
approach is that it is not, and that protect capacity is first
come first served. Is there a need to be more efficient (at the
likely expense of scalability)? I'd say no, and that with some
little tricks one can achieve similar results. Examples of tricks
might be distinction (in class) of w/p, ability for unfulfilled
working LSPs to preempt protect capacity and ability for a protect
lsp called into service to promote itself to "work". however, I
think there might be interest in the following...
1:1 shared path, some form of notification to affect a switch, and
some sort of double booking on protect capacity. I'd place the
priority on this behind the others. To be clear, I'm thinking 1:1
LSPs from head end where protect capacity in the network is
actually 1:N where that terminology would read like "only 1 of
these N can use this capacity or bad things would happen".
Local protect: failure repaired at least initially in the
proximity of the failure. Potentially some later re-grooming at
head end. Head end must have some control in whether their LSPs
are candidates for this, or not. In the simplist of cases,
something like swallow-bypass with 1:N around a single link w/ no
bandwidth reservation nor indications in signalling might be
something to consider. In a more complex approach (which
swallow's can be extended to), gan-fast-reroute might be something
to consider. Need to balance need for post-failure survivability
and scalability. Issues include whether bandwidth is tracked, and
how SRGs might be made use of.
Reroute: After connections fail, try to set something new
up. Nothing new here, nothing to standardize, but a freebie to
mention.
TDM:
1:1 path, same as above, only there needs to be ability to try as
1+1.
1:1 shared path, same as above but obviously something has to
happen after the failure for the network to restore the services.
Options include head-end signalling of activation, or network wide
(flooded) notification of what type of failure has happened with
autonomous reaction by network elements.
Local protect: same as above, but obviously some sort of explicit
or implicit way to avoid overbooking, and assure proper network
configuration would be required.
Reroute: the same as above.
For both of the above network types, something like SRGs are
either useful or required.
Q
Q 3. What are the requirements on local span (i.e., link by link)
Q protection and end-to-end protection, and the interaction between them?
Q E.g.: what should be the granularity of connections for
Q each type (single connection, bundle of connections, etc).
I usually think of trying not to mix protections (if you protect
something 3 times, it just might cost you 8 times more than
unprotected, yet only be as good as the first time you protected
it). However, I see no reason why these shouldn't be exclusive
(should one want to protect, overprotect and then protect once
more). This definitely makes one think that some sort of class
differentiation might be a good thing, or the ability to keep
certain types of traffic on certain LSPs, and for the head end to
exert some control on how the network treats that LSP, and whether
it wants to set up its own path based backup.
Q
Q C. Hierarchy
Q
Q 1. Vertical (between two network layers):
Q What are the requirements for the interaction between restoration
Q procedures across two network layers, when these features are
Q offered in both layers?
Q (Example, MPLS network realized over pt-to-pt
Q optical connections.) Under such a case,
Q
Q (a) Are there any criteria to choose which layer should provide
Q protection?
cost, than service impact which includes both speed of protection
and the latencies in the network minutes after the protection.
Q
Q (b) If both layers provide survivability features, what are the
Q requirements to coordinate these mechanisms?
that they be distinct and reliance on one or both is a seperate
choice of network architecture (e.g. they don't have to be
comingled in a common routing/signalling space, and in fact
probably shouldn't be)
Q
Q (c) How is lack of current functionality of cross-layer
Q cooridnation currently hampering operations?
it isn't.
Q
Q (d) Would the benefits be worth additional complexity associated
Q with routing isolation (e.g. VPN, areas), security, address
Q isolation and policy / authentication processes?
not anytime soon. There are other areas to focus efforts (e.g
just getting multi-vendor TDM routing domains working)
Q
Q
Q
Q 2. Horizontal (between two areas or administrative subdivisions within
Q the same network layer):
Q
Q (a) What are the criteria that trigger the creation of protocol or
Q administrative boundaries pertaining to restoration? (e.g.,
Q scalability? multi-vendor interoperability? what are the
Q practical issues?) multi-provider? Should multi-vendor
Q necessitate hierarchical seperation?
scalability: perceived or actual, yes.
multi-vendor: nope, there should be no need for this in core
networks, or even most access networks. The only exception is
where "toy" network elements (e.g. enterprise focussed IP
solutions) act as edges on networks which carry too much
information for them to maintain (e.g. no BGP, or even no OSPF!).
Nothing to worry about in this forum though.
multi-provider: bgp - policy a must.
Q: Is there anything in the rsvp-based lsps specs which limits
an LSP from being setup across routing areas, or even multi-as?
I'd bet that any such limitations today are merely artifacts of
implementation, and not limitations of specification.
Q
Q When such boundaries are defined:
Q
Q (b) What are the requirements on how protection/restoration is
Q performed end-to-end across such boundaries?
Q
Q (c) If different restoration mechanisms are implemented on two
Q sides of a boundary, what are the requirements on their
Q interaction?
Boundaries are either clean, or are of minimal value. So less
information propagation is better (or just don't put up a
boundary if you must have the information). However, the
concept of network elements that play on both sides of a
boundary might be workable (e.g. OSPF ABRs). That'd allow for
devices on either side to do a intra-area thing within their
region of knowledge, and for the ABR to do this in both areas,
and splice the two protected connections together at a common
point (granted it's a common point of failure now). If the
limitations of this approach start to appear in operational
settings, then perhaps it would then be time to start thinking
about diety-route-servers and signalling propagated directives.
But I don't see a need to dive into that just yet.
Q
Q What is the primary driver of horizontal hierarchy? (select one)
Q - functionality (e.g. metro -v- backbone)
Q - routing scalability
Q - signalling scalability
+-- this one, esp in the context of multi-area ISPs interested
| in VPNS.
V
Q - current network architecture, trying to layer on TE ontop of
Q already hiearchical network architecture
Q - routing and signalling
Q
Q For signalling scalability, is it
Q - managability
Q - processing/state of network
+-- this one, esp in the context of really large (and often
| flat) ISPs interested in VPNs (there it is again, VPN VPN)
V
Q - edge-to-edge N^2 type issue
Q
Q For routing scalability, is it
Q - processing/state of network
Q - are you flat and want to go hierarchical
+-- this one, under perception of of processing/state concern.
| not aware of flat networkers that want/need to go to areas
V
Q - or already hierarchical?
Q - data or TDM application?
data.
Q
Q D. Policy
Q
Q 1. What are the requirements for policy support during
Q protection/restoration,
Q e.g., restoration priority, preemption, etc.
non-fulfilled working traffic should be able to use in-wait of
protection resources to be fulfilled, even if that means tearing
down some LSPs. A protect LSP that is "in-use" should be able
to be promoted somehow to a working LSP class.
Head ends should be able to place different classes of traffic
into different LSPs, and to let the network know that the LSPs
are in different classes with some relation, however I think
this should be generalized, and not tied to certain class
monikers. (akin to the affinities)
Head ends should have control on whether an LSP will be a
candidate for local protection along it's path, and should be
able to setup their own path based protect LSPs.
Policy in the form of credentials, and policy server based
arbitration are not required at this time.
Q
Q E. Signaling Mechanisms
Q
Q 1. What are the requirements on the signaling transport mechanism
Q (e.g., in-band over sonet/sdh overhead bytes, out-of-band over
Q an IP network, etc.) used to communicate restoration protocol
Q messages between network elements. What are the bandwidth and
Q other requirements on the signaling channels?
For data applications, in-band would be sufficient. I don't see
any new problems that restoration signalling has over other
signalling and route propagation issues in TDM/optical networks.
OT: I'm a bit weary of out-of-band approaches, because they're
strange and foreign to me, and thus suspect :) But again,
that's a general issue with optical/tdm networks.
Q
Q 2. What are the requirements on fault detection/localization mechanisms
Q (which is the prelude to performing restoration procedures)
Q in the case of opaque and transparent optical networks?
Q What are the requirements in the case of MPLS restoration?
See B.2 answer above.
Q
Q 3. What are the requirements on signaling protocols to be used in
Q restoration procedures (e.g., high priority processing, security, etc).
They should be robust, scalable, secure and fast too.
Q
Q 4. Are there any requirements on the operation of restoration protocols?
Q
As in what, as in control of reversion, grooming, etc..??
Sure.
Q E. Quantitative
Q
Q 1. What are the quantitative requirements (e.g., latency) for completing
Q restoration under different protection modes (for both local and
Q end-to-end protection)?
Subject to speed of light, 10s of ms for local repair and
< 200 ms for path based approach (assuming a path from new
york to los angeles for instance).
Q
Q F. Management
Q
Q 1. What information should be measured/maintained by the control plane at
Q each network element pertaining to restoration events?
They should propogate correlation of resources to SRGs. Not
sure at how interwoven CAC/signalling and SRGs should be though.
Q
Q 2. What are the requirements for the correlation between control plane
Q and data plane failures from the restoration point of view?
Q
Permuting:
Control Down / Data Down -> protect, an obvious case.
Control Up / Data Down -> is oam needed? This happens
sometimes, not sure if it needs to be fixed, as many bad
things happen sometimes. Is there a need for oam, and of
oam signalling switches? I'd vote no now.
Control Down / Data Up -> there seems to be a requirement for
persistence of connections during control failure in TDM
networks. I think that failure of control planes should
activate switching in the data plane (as happens in IP
networks).
Q
Q
Q