[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

my answers to the questions




Q A. Definitions
Q 
Q 1. In determining the specific requirements, the design team should
Q    precisely define  the concepts "survivability", "restoration",
Q    "protection", "protection switching", "recovery", "re-routing"
Q    etc. and their relations. This would enable the requirements doc to
Q    describe precisely which of these will be addressed. 
Q 
Q    In the following, the term "restoration" is used to indicate the broad
Q    set of policies and mechanisms used to ensure survivability.

   I had thought that the definitions were already well defined in the
   variety of drafts, especially
   draft-ietf-mpls-recovery-frmwrk-02.txt.  While the recovery-frmwrk
   draft does have a definitions section, it doesn't cover all the
   above, and probably an expanded definitions list is needed.  In
   looking at some different drafts, I did see some arbitrary
   distinctions in terminology that might be confusing (e.g. that of
   protection switching = reserved bw and restoration = shared bw in
   draft-lang-ccamp-recovery-00.txt).

   So I propose we take what is in the recovery-frmwrk draft as a
   base and expand/refine out a definitions list.

Q 
Q B. Network types and protection modes
Q 
Q 1. What is the scope of the requirements with regard to the types
Q     of networks covered? Specifically, are the following in scope:
Q 
Q     -  Restoration of connections in mesh optical networks
Q        (opaque or transparent)

   No.  Maybe I'm not exactly clear on what a mesh optical network is,
   but if it's one with minimal wavelength conversion, and one in
   which LSP connections are likely to be very tied to physics and
   implementation, I suspect that (1) these networks should benefit
   from extensibility of protocols that will first see production in a
   more OEO based TDM network and (2) interoperability in this realm
   will be challenging at best.

Q     -  Restoration of connections in hybrid mesh-ring networks

   In what?  Is this something real - I've heard the term come up a
   lot, is it coming from a broad base of supporters?

Q     -  Restoration of LSPs in MPLS networks (composed of LSRs overlaid on a
Q        transport network, e.g., optical)

    In packet networks?  Yes.  I think this is probably the most
    imperative area to foster a small set of interoperable
    approaches.  For one, we more or less have interoperable
    implementations of IGPs and signalling, so a nice solid base to
    extend into having interoperable/common restoration approaches.

Q     -  Any other types of networks?

    TDM networks.  Things full of DCS and ADMs operating at
    narrowband, using TDM for muxing.  This would be the next area I'd
    place importance on.


Q     -  Is commonality of approach, or optimization of approach more
Q        important?

    Hmmm... what was I thinking with this question?  I suppose I was
    wondering if the same signalling base/approach would be used for
    both 1:1 path as well as 1+1 path, as well as 1:N link
    protection.  I was also wondering if media specific optimizations
    (such as interpretation of path ais in lieu of explicit
    signalling) was good.  Thinking more about it though, it's likely
    that no-one would be interested in a signalling approach that
    could be potentially construed in a variety of ways (depending on
    configuration along the way), so some amount of information is
    going to be necessary in the signalling message.

    I think we should hazard that in principle only the minimal amount
    of information should be transported, e.g. err towards too sparse
    as opposed to verbosity.  reason would be scalability/state.

Q 
Q 2.  What are the requirements with regard to
Q      the protection modes to be supported in each network type covered?
Q      (Examples of protection modes include 1+1, M:N, shared mesh,
Q      UPSR, BLSR, newly defined modes such as P-cycles, etc.)

    Data:

    1:1 path, some form of notification to affect a switch.  Some
    distinction between W/P LSP ok (e.g. different class/priority).
    Need to allow head end to create paths which are as failure
    disjoint as possible from each other.  Need SRGs which can be
    assigned to either nodes or links between nodes.  There is an
    issue of whether protect bandwidth is shared or not.  Easiest
    approach is that it is not, and that protect capacity is first
    come first served.  Is there a need to be more efficient (at the
    likely expense of scalability)?  I'd say no, and that with some
    little tricks one can achieve similar results.  Examples of tricks
    might be distinction (in class) of w/p, ability for unfulfilled
    working LSPs to preempt protect capacity and ability for a protect
    lsp called into service to promote itself to "work".  however, I
    think there might be interest in the following...

    1:1 shared path, some form of notification to affect a switch, and
    some sort of double booking on protect capacity.  I'd place the
    priority on this behind the others.  To be clear, I'm thinking 1:1
    LSPs from head end where protect capacity in the network is
    actually 1:N where that terminology would read like "only 1 of
    these N can use this capacity or bad things would happen".

    Local protect: failure repaired at least initially in the
    proximity of the failure.  Potentially some later re-grooming at
    head end.  Head end must have some control in whether their LSPs
    are candidates for this, or not.  In the simplist of cases,
    something like swallow-bypass with 1:N around a single link w/ no
    bandwidth reservation nor indications in signalling might be
    something to consider.  In a more complex approach (which
    swallow's can be extended to), gan-fast-reroute might be something
    to consider.  Need to balance need for post-failure survivability
    and scalability.  Issues include whether bandwidth is tracked, and
    how SRGs might be made use of.

    Reroute:  After connections fail, try to set something new
    up. Nothing new here, nothing to standardize, but a freebie to
    mention. 

    TDM: 

    1:1 path, same as above, only there needs to be ability to try as
    1+1.

    1:1 shared path, same as above but obviously something has to
    happen after the failure for the network to restore the services.
    Options include head-end signalling of activation, or network wide
    (flooded) notification of what type of failure has happened with
    autonomous reaction by network elements.

    Local protect:  same as above, but obviously some sort of explicit
    or implicit way to avoid overbooking, and assure proper network
    configuration would be required.

    Reroute:  the same as above.

    For both of the above network types, something like SRGs are
    either useful or required.


Q 
Q 3.  What are the requirements on local span (i.e., link by link)
Q      protection and end-to-end protection, and the interaction between them?
Q      E.g.: what should be the granularity of connections for
Q      each type (single connection, bundle of connections, etc).

    I usually think of trying not to mix protections (if you protect
    something 3 times, it just might cost you 8 times more than
    unprotected, yet only be as good as the first time you protected
    it).  However, I see no reason why these shouldn't be exclusive
    (should one want to protect, overprotect and then protect once
    more).  This definitely makes one think that some sort of class
    differentiation might be a good thing, or the ability to keep
    certain types of traffic on certain LSPs, and for the head end to
    exert some control on how the network treats that LSP, and whether
    it wants to set up its own path based backup.

Q 
Q C. Hierarchy
Q 
Q 1. Vertical (between two network layers):
Q     What are the requirements for the interaction between restoration
Q     procedures across two network layers, when these features are
Q     offered in both layers? 
Q     (Example, MPLS network realized over pt-to-pt
Q     optical connections.) Under such a case,
Q 
Q     (a) Are there any criteria to choose which layer should provide
Q           protection?

    cost, than service impact which includes both speed of protection
    and the latencies in the network minutes after the protection.
Q 
Q     (b) If both layers provide survivability features, what are the
Q           requirements to coordinate these mechanisms?

    that they be distinct and reliance on one or both is a seperate
    choice of network architecture (e.g. they don't have to be
    comingled in a common routing/signalling space, and in fact
    probably shouldn't be)
Q 
Q     (c) How is lack of current functionality of cross-layer
Q 	  cooridnation currently hampering operations?

    it isn't.

Q 
Q     (d) Would the benefits be worth additional complexity associated
Q           with routing isolation (e.g. VPN, areas), security, address
Q           isolation and policy / authentication processes?

    not anytime soon.  There are other areas to focus efforts (e.g
    just getting multi-vendor TDM routing domains working)

Q 
Q 
Q 
Q 2. Horizontal (between two areas or administrative subdivisions within
Q     the same network layer):
Q 
Q     (a) What are the criteria that trigger the creation of protocol or
Q           administrative boundaries pertaining to restoration? (e.g.,
Q           scalability?  multi-vendor interoperability? what are the
Q           practical issues?)  multi-provider? Should multi-vendor
Q           necessitate hierarchical seperation?

      scalability:  perceived or actual, yes.

      multi-vendor:  nope, there should be no need for this in core
      networks, or even most access networks.  The only exception is
      where "toy" network elements (e.g. enterprise focussed IP
      solutions) act as edges on networks which carry too much
      information for them to maintain (e.g. no BGP, or even no OSPF!).
      Nothing to worry about in this forum though.

      multi-provider:  bgp - policy a must.  

      Q: Is there anything in the rsvp-based lsps specs which limits
      an LSP from being setup across routing areas, or even multi-as?
      I'd bet that any such limitations today are merely artifacts of
      implementation, and not limitations of specification.

Q 
Q     When such boundaries are defined:
Q 
Q     (b) What are the requirements on how protection/restoration is
Q           performed end-to-end across such boundaries?
Q 
Q     (c) If different restoration mechanisms are implemented on two
Q           sides of a boundary, what are the requirements on their
Q           interaction?

      Boundaries are either clean, or are of minimal value.  So less
      information propagation is better (or just don't put up a
      boundary if you must have the information).  However, the
      concept of network elements that play on both sides of a
      boundary might be workable (e.g. OSPF ABRs).  That'd allow for
      devices on either side to do a intra-area thing within their
      region of knowledge, and for the ABR to do this in both areas,
      and splice the two protected connections together at a common
      point (granted it's a common point of failure now).  If the
      limitations of this approach start to appear in operational
      settings, then perhaps it would then be time to start thinking
      about diety-route-servers and signalling propagated directives.
      But I don't see a need to dive into that just yet.

Q 
Q    What is the primary driver of horizontal hierarchy? (select one)
Q     - functionality (e.g. metro -v- backbone)
Q     - routing scalability
Q     - signalling scalability

      +-- this one, esp in the context of multi-area ISPs interested
      |      in VPNS.
      V
Q     - current network architecture, trying to layer on TE ontop of 
Q       already hiearchical network architecture

Q     - routing and signalling
Q 
Q    For signalling scalability, is it
Q     - managability
Q     - processing/state of network


      +-- this one, esp in the context of really large (and often
      |	     flat) ISPs interested in VPNs (there it is again, VPN VPN)
      V
Q     - edge-to-edge N^2 type issue
Q 
Q     For routing scalability, is it
Q     - processing/state of network
Q     - are you flat and want to go hierarchical

      +-- this one, under perception of of processing/state concern.
      |	       not aware of flat networkers that want/need to go to areas
      V	
Q     - or already hierarchical?
Q     - data or TDM application?

      data.

Q 
Q D. Policy
Q 
Q 1. What are the requirements for policy support during
Q protection/restoration,
Q     e.g., restoration priority, preemption, etc.

      non-fulfilled working traffic should be able to use in-wait of
      protection resources to be fulfilled, even if that means tearing
      down some LSPs.  A protect LSP that is "in-use" should be able
      to be promoted somehow to a working LSP class.

      Head ends should be able to place different classes of traffic
      into different LSPs, and to let the network know that the LSPs
      are in different classes with some relation, however I think
      this should be generalized, and not tied to certain class
      monikers. (akin to the affinities)

      Head ends should have control on whether an LSP will be a
      candidate for local protection along it's path, and should be
      able to setup their own path based protect LSPs.

      Policy in the form of credentials, and policy server based
      arbitration are not required at this time.

Q 
Q E. Signaling Mechanisms
Q 
Q 1. What are the requirements on the signaling transport mechanism
Q    (e.g., in-band over sonet/sdh overhead bytes, out-of-band over
Q    an IP network, etc.) used to communicate restoration protocol
Q    messages between network elements. What are the bandwidth and
Q    other requirements on the signaling channels?

      For data applications, in-band would be sufficient.  I don't see
      any new problems that restoration signalling has over other
      signalling and route propagation issues in TDM/optical networks.
      OT:  I'm a bit weary of out-of-band approaches, because they're
      strange and foreign to me, and thus suspect :)  But again,
      that's a general issue with optical/tdm networks.

Q 
Q 2. What are the requirements on fault detection/localization mechanisms
Q    (which is the prelude to performing restoration procedures)
Q    in the case of opaque and transparent optical networks?
Q    What are the requirements in the case of MPLS restoration?

       See B.2 answer above.
Q 
Q 3. What are the requirements on signaling protocols to be used in
Q    restoration procedures (e.g., high priority processing, security, etc).

       They should be robust, scalable, secure and fast too.

Q 
Q 4. Are there any requirements on the operation of restoration protocols?
Q 

	As in what, as in control of reversion, grooming, etc..??
	Sure.

Q E. Quantitative
Q 
Q 1. What are the quantitative requirements (e.g., latency) for completing
Q    restoration under different protection modes (for both local and
Q    end-to-end protection)?

	Subject to speed of light, 10s of ms for local repair and
	< 200 ms for path based approach (assuming a path from new
	york to los angeles for instance).  
Q 
Q F. Management
Q 
Q 1. What information should be measured/maintained by the control plane at
Q     each network element pertaining to restoration events?

       They should propogate correlation of resources to SRGs.  Not 
       sure at how interwoven CAC/signalling and SRGs should be though.

Q 
Q 2. What are the requirements for the correlation between control plane
Q     and data plane failures from the restoration point of view?
Q 

       Permuting:

       Control Down / Data Down -> protect, an obvious case.

       Control Up / Data Down -> is oam needed?  This happens
             sometimes, not sure if it needs to be fixed, as many bad
             things happen sometimes.  Is there a need for oam, and of
             oam signalling switches?  I'd vote no now.

       Control Down / Data Up -> there seems to be a requirement for
             persistence of connections during control failure in TDM
             networks. I think that failure of control planes should
             activate switching in the data plane (as happens in IP
             networks). 

Q 
Q 
Q