[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Internet Draft Submission
Hi,
I am submitting herewith an Internet Draft for the Repository.
Thanks, Wai Sum.
Traffic Engineering Working Group Wai Sum Lai, AT&T
Internet Draft Dave McDysan, WorldCom
<draft-team-tewg-restore-hierarchy-00.txt> (Co-Editors)
Category: Informational
Expiration Date: January 2002 Jim Boyle
Malin Carlzon
Rob Coltun, Redback
Tim Griffin, AT&T
Ed Kern, Cogent
Tom Reddington, Lucent
July 2001
Network Hierarchy and Multilayer Survivability
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026 [1].
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet- Drafts
as reference material or to cite them other than as "work in
progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
1. Abstract
This document is the deliverable out of the Network Hierarchy and
Survivability Techniques Design Team established within the Traffic
Engineering Working Group. This team was requested to try to
determine what the current and near term requirements are for
survivability and hierarchy in MPLS networks. The team determined
that there appears to be a need for common, interoperable
survivability approaches in packet and non-packet networks.
Suggested approaches include path-based as well as one that repairs
connections in proximity to the network fault. For clarity, an
expanded set of definitions is included. As for hierarchy, there
did not appear to be as much need for work on "vertical hierarchy,"
defined as communication between network layers such as TDM/optical
and MPLS. In particular, instead of direct exchange of signaling
and routing between vertical layers, some looser form of
coordination and communication is a nearer term need. For
Lai, et al Category - Expiration [1]
Network Hierarchy and Multilayer Survivability July 2001
"horizontal hierarchy" in data networks, there does appear to be a
pressing need. This requirement is often presented in the context
of layer 2 and layer 3 VPN services where SLAs would appear to
necessitate signaling from the edges into the core of a network.
Issues include potential current protocols limitations in networks
which are hierarchical (e.g. multi-area OSPF) and scalability
concerns of potentially O(N^2) connection growth in larger networks.
Please send comments to te-wg@ops.ietf.org
2. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC-2119 [2].
3. Introduction
This document presents a proposal of the tangible requirements for
network survivability and hierarchy in current service provider
environments. With feedback from the working group solicited, the
objective is to help focus the work that is being addressed in the
traffic engineering, ccamp and other working groups. A main goal of
this work is to provide some expedience for required functionality
in multi-vendor service provider networks. The initial focus is
primarily on intra-domain operations. However, to maintain
consistency in the provision of end-to-end service in a multi-
provider environment, rules governing the operations of
survivability mechanisms at domain boundaries must also be
specified. While such issues are raised and discussed, where
appropriate, they will not be treated in depth in the initial
release of this document.
The document first develops a set of definitions to be used later in
this document and potentially in other documents as well. It then
addresses the requirements and issues associated with service
restoration, hierarchy, and finally a short discussion of
survivability in hierarchical context.
4. Definitions
4.1 Hierarchy Terminology
Network hierarchy is an abstraction of part of a network's topology
and the routing and signaling mechanism needed to support the
topological abstraction. Abstraction may be used as a mechanism to
build large networks or as a technique for enforcing administrative,
topological or geographic boundaries. For example, network
hierarchy might be used to separate the metropolitan and long-haul
Lai, et al Category - Expiration 2
Network Hierarchy and Multilayer Survivability July 2001
regions of a network or to separate the regional and backbone
sections of a network [Bert Wijnen], or to interconnect service
provider networks (with BGP which reduces a network to an Autonomous
System). In this document, network hierarchy is considered from two
perspectives:
(1) Horizontally oriented: between two areas or administrative
subdivisions within the same network layer
(2) Vertically oriented: between two network layers
Horizontal hierarchy is the abstraction necessary to allow a network
at one network layer, for instance a packet network, to grow.
Examples of horizontal hierarchy include BGP and multi-area OSPF.
Vertical hierarchy is the abstraction, or reduction in information,
which would be of benefit when communicating information across
network layers, as in propagating information between optical and
router networks.
4.2 Survivability Terminology
Extra traffic is the traffic carried over the protection entity
while the working entity is active. Extra traffic is not protected,
i.e., when the protection entity is required to protect the traffic
that is being carried over the working entity (e.g., due to a
failure affecting the working entity), the extra traffic is
preempted.
Normalization is the return to the normal state of a network upon
completing the repair of the network failure. This could include
the rerouting of affected traffic to the original working entities
or new routes. The term revertive mode is used when traffic is
returned to the working entity (switch back).
Protection, also called protection switching, is a survivability
technique based on predetermined failure recovery: as the working
entity is established, resources are reserved for the protection
entity. These resources may be used by low-priority traffic
(referred to as extra traffic) if traffic preemption is allowed.
Depending on the amount of reserved resources, not all of the
affected traffic may be protected. (For further discussion of
concepts related to protection, see the Sub-section below on
Survivability Concepts.)
Protection entity (also called back-up entity or recovery entity) is
the entity that is used to carry protected traffic in protection
operation mode, i.e., when the working entity is in error or has
failed.
Recovery is the sequence of actions taken by a network after the
detection of a failure to maintain the required performance level
for existing services (e.g., according to service level agreements)
and to allow normalization of the network. The actions include
Lai, et al Category - Expiration 3
Network Hierarchy and Multilayer Survivability July 2001
notification of the failure followed by two parallel processes: (1)
a repair process with fault isolation and repair of the failed
components, and (2) a reconfiguration process with path selection
and rerouting for the affected traffic.
Rerouting is placement of affected traffic from the working entity
to the protection entity, when the path for the protection entity
has been selected after the detection of a fault on the working
entity. This is synonymous with switch-over in protection
techniques. (In [3], rerouting is synonymous with restoration.)
Restoration is a survivability technique that dynamically discovers
the alternate path from spare resources in network, or establishes
new paths on demand, for affected traffic once the failure is
detected and the affected traffic is identified for rerouting. The
new path may be based on preplanned configurations or current
network status. Thus, restoration involves a path selection process
followed by traffic rerouting. (In [3], restoration is referred to
as recovery by rerouting.)
Restoration, or more specifically, service restoration, refers to
the actions taken by a network to maintain service continuity after
the detection of a failure. In this second usage, restoration has a
meaning very similar to recovery, except that restoration covers
only the reconfiguration process and not the repair process. Also,
in this usage, it should be clear from the context that it is
irrelevant whether the survivability technique used to achieve
service continuity is based on protection or restoration techniques.
Restoration time is the time interval from the occurrence of a
network impairment to the instant when the affected traffic is
either completely rerouted or until spare resources are exhausted
and/or no more preemptable traffic to make room.
Revertive mode is a procedure in which revertive action, i.e.,
switch back from the protection entity to the working entity, is
taken once the failed working entity has been repaired. In non-
revertive mode, such action is not taken. To minimize service
interruption, switch-back in revertive mode should be performed at a
time when there is the least impact on the traffic concerned, or by
using the make-before-break concept.
Shared risk group (SRG) is a set of network elements that are
collectively impacted by a specific fault or fault type. For
example, a shared risk link group (SRLG) is the union of all the
links on those fibers that are routed in the same physical conduit
in a fiber-span network. This concept includes, besides shared
conduit, other types of compromise such as shared fiber cable,
shared right of way, shared optical ring, shared office without
power sharing, etc. The span of an SRG, such as the length of the
sharing for compromised outside plant, needs to be considered on a
per fault basis.
Lai, et al Category - Expiration 4
Network Hierarchy and Multilayer Survivability July 2001
Survivability is the capability of a network to maintain service
continuity in the presence of faults within the network [4].
Survivability techniques such as protection and restoration are
implemented either on a per-link basis, on a per-path basis, or
throughout an entire network to alleviate service disruption at
affordable costs. The degree of survivability is determined by the
network's capability to survive single failures, multiple failures,
and equipment failures.
Working entity is the entity that is used to carry traffic in normal
operation mode. Depending on the context, an entity can be, e.g., a
channel or a transmission link in the physical layer, an LSP in
MPLS, or a logical bundle of one or more LSPs.
4.3 Survivability Concepts
In a survivable network design, spare capacity and diversity must be
built into the network from the beginning to support some degree of
self-healing whenever failures occur. A common strategy is to
associate each working entity with a protection entity having either
dedicated resources or shared resources that are pre-reserved or
reserved-on-demand. According to the methods of setting up a
protection entity, different approaches to providing survivability
can be classified. Generally, protection techniques are based on
having a dedicated protection entity set up prior to failure. Such
is not the case in restoration techniques, which mainly rely on the
use of spare capacity in the network. Hence, in terms of trade-
offs, protection techniques usually offer fast recovery from failure
with enhanced availability, while restoration techniques usually
achieve better resource utilization.
Protection techniques can be implemented by several architectures:
1+1, 1:1, 1:n, and m:n. In the context of SDH/SONET, they are
referred to as Automatic Protection Switching (APS).
In the 1+1 protection architecture, a protection entity is dedicated
to each working entity. The dual-feed mechanism is used whereby the
working entity is permanently bridged onto the protection entity at
the source of the protected domain. In normal operation mode,
identical traffic is transmitted simultaneously on both the working
and protection entities. At the sink of the protected domain, both
feeds are monitored for alarms and maintenance signals. A selection
between the working and protection entity is made based on some
predetermined criteria, such as the transmission performance
requirements or defect indication. This architecture is rather
expensive since resource duplication is required. It is generally
used for specific services that need a very high availability.
In the 1:1 protection architecture, a protection entity is also
dedicated to each working entity. The protected traffic is normally
transmitted by the working entity. If the working entity has
failed, the protected traffic is rerouted to the protection entity.
Lai, et al Category - Expiration 5
Network Hierarchy and Multilayer Survivability July 2001
This architecture is inherently slower in recovering from failure
than a 1+1 architecture since communication between both ends of the
protection domain is required to perform the switch-over operation.
An advantage is that the protection entity can optionally be used to
carry preemptable "extra traffic" in normal operation. Also, in
packet networks, a protection path can be pre-established for later
use with pre-planned but not pre-reserved capacity. (If no packets
are sent into a link, no bandwidth is consumed.) This is not the
case in channelized transport networks.
In the 1:n protection architecture, a dedicated protection entity is
shared by n working entities. Traffic is normally sent on the
working entities. When multiple working entities have failed
simultaneously, only one of them can be restored by the common
protection entity. This contention is resolved by assigning a
different preemptive priority to each working entity. As in the 1:1
case, the protection entity can optionally be used to carry
preemptable "extra traffic" in normal operation
The m:n architecture is a generalization of the 1:n architecture.
Typically m <= n, m dedicated protection entities are shared by n
working entities. While this architecture can improve system
availability with small cost increases, it has rarely been
implemented or standardized.
5. Survivability
5.1 Scope
Interoperable approaches to network survivability were determined to
be an immediate requirement in packet networks as well as in
SDH/SONET framed TDM networks. Not as pressing at this time were
techniques which would cover all-optical networks (e.g., where
framing is unknown), as the control of these networks in a multi-
vendor environment appeared to have some other hurdles to first deal
with. Also, not of immediate interest were approaches to coordinate
or explicitly communicate survivability mechanisms across network
layers (such as from a TDM or optical network to/from an IP
network). However, a capability should be provided for a network
operator to control the operation of survivability mechanisms among
different layers. Such issues and those related to OAM are
currently outside the scope of this document. (For proposed MPLS
OAM requirements, see [5]).
The types of network failures that cause a restoration to be
performed include link/span and node failures (which might include
span failures at lower layers). Other more complex failure
mechanisms such as systematic control-plane failure or breach of
security are not within the scope of the survivability mechanisms
discussed in this document.
Lai, et al Category - Expiration 6
Network Hierarchy and Multilayer Survivability July 2001
5.2 Required initial set of survivability mechanisms
5.2.1 1:1 Path Protection with Pre-Established Capacity
In this protection mode, the head end of a working connection
establishes a protection connection to the destination. In normal
operation, traffic is only sent on the working connection, though
the ability to signal that traffic will be sent on both connections
(1+1 Path for signaling purposes) would be valuable in non-packet
networks. Some distinction between working and protection
connections is likely, either through explicit objects, or
preferably through implicit methods such as general classes or
priorities. Head ends need the ability to create connections that
are as failure disjoint as possible from each other. This would
require SRG information that can be generally assigned to either
nodes or links and propagated through the control or management
plane. In this mechanism, capacity in the protection connection is
pre-established, however it can be used to carry preemptable extra
traffic. Protect capacity is first come first served. When protect
capacity is called into service during restoration, there should be
the ability to promote the protection connection to working status
(for non-revertive mode operation) with some form of make-before-
break capability.
5.2.2 1:1 Path Protection with Pre-Planned Capacity
Similar to the above 1:1 protection with pre-established capacity,
the protection connection in this case is also pre-signaled. The
difference is in the way protect capacity is assigned. With pre-
planned capacity, the mechanism supports the ability for the protect
capacity to be shared, or "double-booked." It would be expected
that should operator predicted failures occur, which potentially
could rely on enumeration in SRGs, that only a limited set of
protect connections would be put into service, and that the protect
capacity available in the network would be able to fulfill this
traffic (given proper sizing and planning of the network). In a
sense, this is 1:1 from a path perspective, however the protect
capacity in the network (on a link by link basis) is shared in a 1:n
fashion. Some form of information propagation could be required
before traffic may be sent on protection connections, especially in
TDM networks. In data networks, a desirable operating approach for
this mechanism might be where the protect capacity is not accurately
booked against SRGs (e.g. non-predictive).
The use of this approach improves network resource utilization, but
may require more careful planning. So, initial deployment might be
based on 1:1 path protection with pre-established capacity and the
local restoration mechanism to be described next.
5.2.3 Local Restoration
Lai, et al Category - Expiration 7
Network Hierarchy and Multilayer Survivability July 2001
Due to the time impact of signal propagation, path-based approaches
may not be able to meet the service requirements desired in some
networks. The solution to this is to restore connectivity in
immediate proximity to the fault. At a minimum, this approach
should be able to protect against connectivity-type SRGs, though
protecting against node-based SRGs might be worthwhile. After local
restoration is in place, it is likely that head end systems would
later perform some path-level re-grooming. Head end systems must
have some control as to whether their connections are candidates for
or excluded from local restoration.
5.2.4 Path Restoration
In this approach, connections that are impacted by a fault are
rerouted by the originating network element upon notification of
connection failure. This approach does not involve any new
mechanisms. It merely is a mention of another common approach to
protecting against faults in a network.
5.3 Applications Supported
With service continuity under failure as a goal, a network is
"survivable" if, in the face of a network failure, connectivity is
interrupted for a brief period and then restored before the network
failure ends. The length of this interrupted period is dependent on
the application supported. Here are some typical applications that
need to be considered:
- Best-effort data: restoration of network connectivity by rerouting
at the IP layer would be sufficient
- Premium data service: need to meet TCP or application protocol
timer requirements
- Voice: call cutoff is in the range of 140 msec to 2 sec
- Other real-time service (e.g., streaming, fax)
- Mission-critical applications
5.4 Timing Bounds for Service Restoration
The approach to picking the types of survivability mechanisms
recommended was to consider a spectrum of mechanisms that can be
used to protect traffic with varying characteristics of
survivability and speed of restoration, and then attempt to select a
few general points which provide some coverage across that spectrum.
The focus of this work is to provide requirements to which a small
set of detailed proposals may be developed, allowing the operator
some (limited) flexibility in approaches to meeting their design
goals in engineering multi-vendor networks. Requirements of
different applications as listed in the previous sub-section were
discussed generally, however none on the team would likely attest to
the scientific merit of the ability of the timing bounds below to
meet any specific application’s needs. A few assumptions include:
Lai, et al Category - Expiration 8
Network Hierarchy and Multilayer Survivability July 2001
Approaches that protection switch without propagation of information
are likely to be faster than those that do require some form of
fault notification to some or all elements in a network.
Approaches that require some form of signaling after a fault will
also likely suffer some timing impact.
Proposed timing bounds for service restoration for different
mechanisms are as follows (all bounds are exclusive of signal
propagation):
1:1 path protection with pre-established capacity: 100-500 ms
1:1 path protection with pre-planned capacity: 100-750 ms
Local restoration: 50 ms
Path restoration: 1-5 seconds
To ensure that the service requirements for different applications
can be met within the above timing bounds, restoration priority is
used to determine the order in which connections are restored (to
minimize service restoration time as well as to gain access to
available spare capacity). For example, mission critical
applications may require high restoration priority. Preemption
priority should only be used in the event that all connections
cannot be restored, in which case connections with lower preemption
priority should be released. Depending on a service provider's
strategy in provisioning network resources for backup, preemption
may or not be needed in the network.
5.5 Coordination Among Layers
A common design goal for multi-layered networks is to provide the
desired level of service in the most cost-effective manner. The use
of multilayer survivability might allow the optimization of spare
resources through the improvement of resource utilization by sharing
spare capacity across different layers, though further
investigations are needed. Coordination during service restoration
among different network layers (e.g. IP, SDH/SONET, optical layer)
might necessitate development of vertical hierarchy. The benefits
of providing survivability mechanisms at multiple layers, and the
optimization of the overall approach, must be weighed with the
associated cost and service impacts.
A default coordination mechanism for inter-layer interaction could
be the use of nested timers and current SDH/SONET fault monitoring,
as has been done traditionally for backward compatibility. Thus,
when lower-layer restoration happens in a longer time period than
higher-layer restoration, a hold-off timer is utilized to avoid
contention between the different single-layer recovery schemes. In
other words, multilayer interaction is addressed by having
successively higher multiplexing levels operate at restoration time
scale greater than the next lowest layer. Currently, if SDH/SONET
protection switching is used, MPLS recovery timers must wait until
SDH/SONET has had time to switch.
Lai, et al Category - Expiration 9
Network Hierarchy and Multilayer Survivability July 2001
It was felt that the current approach to coordination of
survivability approaches currently did not have significant
operational shortfalls. These approaches include protecting traffic
solely at one layer (e.g. at the IP layer over linear WDM, or at the
SDH/SONET layer). Where survivability mechanisms might be deployed
at several layers, such as when a routed network rides a SDH/SONET
protected network, it was felt that current coordination approaches
were sufficient in many cases. One exception is the hold-off of
MPLS recovery until the completion of SDH/SONET protection switching
as described above. This limits the recovery time of fast MPLS
restoration. Also, note that failures within a layer can be guarded
against by techniques either in that layer or at a higher layer, but
not in reverse. Thus, the optical layer cannot guard against
failures in the IP layer such as router system failures, line card
failures.
5.6 Evolution Toward IP Over Optical
As more pressing requirements for survivability and horizontal
hierarchy for edge-to-edge signaling are met with technical
proposals, it is believed that the benefits of merging (in some
manner) the control planes of multiple layers will be outlined.
When these benefits are self-evident, it would then seem to be the
right time to review if vertical hierarchy mechanisms are needed,
and what the requirements might be.
6. Hierarchy Requirements
Efforts in the area of network hierarchy should focus on mechanisms
that would allow more scalable edge-to-edge signaling, or signaling
across networks with existing network hierarchy (such as multi-area
OSPF). This would appear to be a more immediate need than
mechanisms that might be needed to interconnect networks at
different layers.
6.1 Historical Context
One reason for horizontal hierarchy is functionality (e.g., metro
versus backbone). Geographic “islands” reduce the need for
interoperability and make administration and operations less
complex. Using a simpler, more interoperable, survivability scheme
at metro/backbone boundaries is natural for many provider network
architectures. In transmission networks, creating geographic
islands of different vendor equipment has been done for a long time
because multi-vendor interoperability has been difficult to achieve.
Traditionally, providers have to coordinate the equipment on either
end of a "connection," and making this interoperable reduces
complexity. A provider should be able to concatenate survivability
mechanisms in order to provide a "protected link" to the next higher
level. Think of SDH/SONET rings connecting to TDM DXCs with 1+1
line-layer protection between the ADM and the DXC port. The TDM
connection, e.g., a DS3 is protected, but usually all equipment on
each SDH/SONET ring is from a single vendor. The DXC cross
Lai, et al Category - Expiration 10
Network Hierarchy and Multilayer Survivability July 2001
connections are controlled by the provider and the ports are
physically protected resulting in a highly available design. Thus,
concatenation of survivability approaches can be used to cascade
across horizontal hierarchy. While not perfect, it is workable in
the near- to mid-term until multi-vendor interoperability is
achieved.
While the problems associated with multi-vendor interoperability may
necessitate horizontal hierarchy as a practical matter (at least
this has been the case in TDM networks), there may be no technical
reason for it. Members of the team with more experience on IP
networks felt there should be no need for this in core networks, or
even most access networks.
Some of the largest service provider networks currently run a single
area/level IGP. Some service providers, as well as many large
enterprise networks, run multi-area OSPF to gain increases in
scalability. Often, this was from an original design, so it is
difficult to say if the network truly required the hierarchy to
reach its current size.
Some proposals on improved mechanisms to address network hierarchy
have been suggested [6, 7, 8]. This document aims to provide the
concrete requirements so that these and other proposals can first
aim to meet some limited objectives.
6.2 Applications for Horizontal Hierarchy
A primary driver for intra-domain horizontal hierarchy is signaling
scalability in the context of edge-to-edge VPNs, potentially across
traffic-engineered data networks. There are a number of different
approaches to VPNs and they are currently being addressed by
different emerging protocols: RFC 2547bis BGP/MPLS VPNs, provider-
provisioned VPNs based upon MPLS tunnels (e.g., virtual routers),
Pseudo Wire Edge-to-edge Emulation (PWE3), etc. These may or not
need explicit signaling from edge to edge, but it is a common
perception that in order to meet SLAs, some form of edge-to-edge
signaling is required.
For signaling scalability, there are probably two types of network
scenarios to consider:
- Large SP networks with flat routing domains where edge-to-edge
(MPLS) signaling as implemented today would probably not scale.
- Networks which would like to signal edge-to-edge, and might even
scale in a limited application. However, they are hierarchically
routed (e.g. OSPF areas) and current implementations, and
potentially standards prevent signaling across areas. This
requires the development of signaling standards that support
dynamic establishment and potentially restoration of LSPs across a
2-level IGP hierarchy.
Lai, et al Category - Expiration 11
Network Hierarchy and Multilayer Survivability July 2001
Scalability is concerned with the O(N^2) properties of edge-to-edge
signaling. For a large network, maintaining a "connection" between
every edge is simply not scalable. Even if establishing and
maintaining connections is feasible, there might be an impact on
core survivability mechanisms which would cause restoration times to
grow with N^2, which would be undesirable. While some value of N
may be inevitable, approaches to reduce N (e.g. to pull in from the
edge to aggregation points) might be of value.
For routing scalability, especially in data applications, a major
concern is the amount of processing/state that is required in the
variety of network elements. If some nodes might not be able to
communicate and process the state of every other node, it might be
preferable to limit the information. There is one way of thought
that says that the amount of information contained by a horizontal
barrier should be significant, and that impacts this might have on
optimality in route selection and ability to provide global
survivability are accepted tradeoffs.
6.3 Horizontal Hierarchy Requirements
Mechanisms are required to allow for edge-to-edge signaling of
connections through a network. The types of network scenarios
include large networks with a large number of edge devices and flat
interior routing, as well as medium to large networks which
currently have hierarchical interior routing such as multi-area OSPF
or multi-level IS-IS. The primary context of this is edge-to-edge
signaling which is thought to be required to assure the SLAs for the
layer 2 and layer 3 VPNs that are being carried across the network.
Another possible context would be edge-to-edge signaling in TDM
SDH/SONET networks, where metro and core networks again might either
be in a flat or hierarchical interior routing domain.
7. Survivability and Hierarchy
When horizontal hierarchy exist in a network layer, a question
arises as to how survivability can be provided along a connection
which crosses hierarchical boundaries.
In designing protocols to meet the requirements of hierarchy, an
approach to consider is that boundaries are either clean, or are of
minimal value. However, the concept of network elements that
participate on both sides of a boundary might be a consideration
(e.g. OSPF ABRs). That would allow for devices on either side to
take an intra-area approach within their region of knowledge, and
for the ABR to do this in both areas, and splice the two protected
connections together at a common point (granted it is a common point
of failure now). If the limitations of this approach start to
appear in operational settings, then perhaps it would be time to
start thinking about route-servers and signaling propagated
directives. However, one initial approach might be to signal
through a common border router, and to consider the service as
protected as it consist of a concatenated set of connections which
Lai, et al Category - Expiration 12
Network Hierarchy and Multilayer Survivability July 2001
are each protected within their area. Another approach might be to
have a least common denominator mechanism at the boundary, e.g., 1+1
port protection. There should also be some standardized means for a
survivability scheme on one side of such a boundary to communicate
with the scheme on the other side regarding the success or failure
of the service restoration action. For example, if a part of a
"connection" is down on one side of such a boundary, there is no
need for the other side to recover from failures.
In summary, at this time, approaches that allow concatenation of
survivability schemes across hierarchical boundaries should provide
sufficient.
8. Security Considerations
Security is not considered in this initial version.
9. References
1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP
9, RFC 2026, October 1996.
2 Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997
3 V. Sharma, B. Crane, K. Owens, C. Huang, F. Hellstrand, J. Weil,
L. Andersson, B. Jamoussi, B. Cain, S. Civanlar, and A. Chiu,
"Framework for MPLS-based Recovery," Internet-Draft, Work in
Progress, March 2001.
4 D.O. Awduche, A. Chiu, A. Elwalid, I. Widjaja, and X. Xiao, "A
Framework for Internet Traffic Engineering," Internet-Draft, Work
in Progress, May 2001.
5 N. Harrison, et al, "Requirements for OAM in MPLS Networks,"
Internet-Draft, Work in Progress, May 2001.
6 K. Kompella and Y. Rekhter, "Multi-area MPLS Traffic
Engineering," Internet-Draft, Work in Progress, March 2001.
7 G. Ash, et al, "Requirements for Multi-Area TE," Internet-Draft,
Work in Progress, March 2001.
8 A. Iwata, N. Fujita, G.R. Ash, and A. Farrel, "Crankback Routing
Extensions for MPLS Signaling," Internet-Draft, Work in Progress,
July 2001.
10. Acknowledgments
Lai, et al Category - Expiration 13
Network Hierarchy and Multilayer Survivability July 2001
A lot of the direction taken in this document, and by the team, was
steered by the insightful questions provided by Bala Rajagoplan,
Greg Bernstein, Yangguang Xu, and Avri Doria. The set of questions
is attached as Appendix A in this document.
11. Author's Addresses
Wai Sum Lai
AT&T
200 Laurel Avenue
Middletown, NJ 07748, USA
Tel: +1 732-420-3712
wlai@att.com
Dave McDysan
WorldCom
22001 Loudoun County Pkwy
Ashburn, VA 20147, USA
dave.mcdysan@wcom.com
Jim Boyle
jimpb@nc.rr.com
Malin Carlzon
malin@sunet.se
Rob Coltun
rcoltun@redback.com
Tim Griffin
AT&T
180 Park Avenue
Florham Park, NJ 07932, USA
Tel: +1 973-360-7238
griffin@research.att.com
Ed Kern
Cogent Communications
3413 Metzerott Rd
College Park, MD 20740, USA
Tel: +1 703-852-0522
ejk@tech.org
Tom Reddington
Lucent Technologies
67 Whippany Rd
Whippany, NJ 07981, USA
Tel: +1 973-386-7291
treddington@bell-labs.com
Appendix A: Questions used to help develop requirements
Lai, et al Category - Expiration 14
Network Hierarchy and Multilayer Survivability July 2001
A. Definitions
1. In determining the specific requirements, the design team should
precisely define the concepts "survivability", "restoration",
"protection", "protection switching", "recovery", "re-routing" etc.
and their relations. This would enable the requirements doc to
describe precisely which of these will be addressed.
In the following, the term "restoration" is used to indicate the
broad set of policies and mechanisms used to ensure survivability.
B. Network types and protection modes
1. What is the scope of the requirements with regard to the types of
networks covered? Specifically, are the following in scope:
Restoration of connections in mesh optical networks (opaque or
transparent)
Restoration of connections in hybrid mesh-ring networks
Restoration of LSPs in MPLS networks (composed of LSRs overlaid on a
transport network, e.g., optical)
Any other types of networks?
Is commonality of approach, or optimization of approach more
important?
2. What are the requirements with regard to the protection modes to
be supported in each network type covered? (Examples of protection
modes include 1+1, M:N, shared mesh, UPSR, BLSR, newly defined modes
such as P-cycles, etc.)
3. What are the requirements on local span (i.e., link by link)
protection and end-to-end protection, and the interaction between
them? E.g.: what should be the granularity of connections for each
type (single connection, bundle of connections, etc).
C. Hierarchy
1. Vertical (between two network layers):
What are the requirements for the interaction between
restoration procedures across two network layers, when these
features are offered in both layers? (Example, MPLS network
realized over pt-to-pt optical connections.) Under such a case,
(a) Are there any criteria to choose which layer should provide
protection?
(b) If both layers provide survivability features, what are the
requirements to coordinate these mechanisms?
(c) How is lack of current functionality of cross-layer
cooridnation currently hampering operations?
Lai, et al Category - Expiration 15
Network Hierarchy and Multilayer Survivability July 2001
(d) Would the benefits be worth additional complexity associated
with routing isolation (e.g. VPN, areas), security, address
isolation and policy / authentication processes?
2. Horizontal (between two areas or administrative subdivisions
within the same network layer):
(a) What are the criteria that trigger the creation of protocol
or administrative boundaries pertaining to restoration? (e.g.,
scalability? multi-vendor interoperability? what are the practical
issues?) multi-provider? Should multi-vendor necessitate
hierarchical seperation?
When such boundaries are defined:
(b) What are the requirements on how protection/restoration is
performed end-to-end across such boundaries?
(c) If different restoration mechanisms are implemented on two
sides of a boundary, what are the requirements on their interaction?
What is the primary driver of horizontal hierarchy? (select one)
- functionality (e.g. metro -v- backbone)
- routing scalability
- signaling scalability
- current network architecture, trying to layer on TE ontop of
already hiearchical network architecture
- routing and signalling
For signalling scalability, is it
- managability
- processing/state of network
- edge-to-edge N^2 type issue
For routing scalability, is it
- processing/state of network
- are you flat and want to go hierarchical
- or already hierarchical?
- data or TDM application?
D. Policy
1. What are the requirements for policy support during
protection/restoration,
e.g., restoration priority, preemption, etc.
E. Signaling Mechanisms
1. What are the requirements on the signaling transport mechanism
(e.g., in-band over sonet/sdh overhead bytes, out-of-band over an IP
network, etc.) used to communicate restoration protocol
messages between network elements. What are the bandwidth and
other requirements on the signaling channels?
Lai, et al Category - Expiration 16
Network Hierarchy and Multilayer Survivability July 2001
2. What are the requirements on fault detection/localization
mechanisms (which is the prelude to performing restoration
procedures) in the case of opaque and transparent optical networks?
What are the requirements in the case of MPLS restoration?
3. What are the requirements on signaling protocols to be used in
restoration procedures (e.g., high priority processing, security,
etc).
4. Are there any requirements on the operation of restoration
protocols?
F. Quantitative
1. What are the quantitative requirements (e.g., latency) for
completing restoration under different protection modes (for both
local and end-to-end protection)?
G. Management
1. What information should be measured/maintained by the control
plane at each network element pertaining to restoration events?
2. What are the requirements for the correlation between control
plane and data plane failures from the restoration point of view?
Full Copyright Statement
"Copyright (C) The Internet Society (date). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implmentation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph
are included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
Lai, et al Category - Expiration 17
Network Hierarchy and Multilayer Survivability July 2001
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Lai, et al Category - Expiration 18