[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [TE-wg] TE use in today's networks



jrex> Depending on which link fails, the network load after the failure
jrex> isn't all that bad.  Although some failures can cause problems, often
jrex> one or two weight changes after the failure is enough to bring the
jrex> network back to a happy place (analogous to the need to fail over to
jrex> backup paths in MPLS).

ben> The difference in the MPLS case is that the backup paths can be
ben> configured in advance.  With TE using link metrics, you may be forced
ben> into changing the configuration when you least want to: during a
ben> failure situation.

[sorry for the slow reply]

You are absolutely right, but...

In both the MPLS and OSPF cases, there are two main issues

  - detecting the failed/restored link
  - changing to new paths

The first issue can be handled by the interface itself detecting the
failure or by lost heartbeats between the neighboring routers. Careful
selection of the heartbeat frequency can reduce the detection delay
(see http://search.ietf.org/internet-drafts/draft-alaettinoglu-isis-convergence-00.txt).

In the OSPF case, the second issue could involve basic recomputation
of paths using the existing weights.  In many cases, the resulting
paths are not all that bad.  In some cases, some weights may need to
be changed to move to a better set of paths for the prevailing
traffic; in practice, it is sufficient to change one or two weights,
and these changes can be precomputed.  Yes, this does require a
network management system (or person) to effect the changes.  That is
a definitely a disadvantage, as you mention.  But, how significant is
this disadvantage?

The details of the MPLS case really depend on what MPLS features are
in use.  

  * On one extreme, you could imagine having precomputed backup
    paths (or even backup subpaths to replace individual link failures).
    Something has to trigger the failover to these backup paths.  In 
    theory, this can all be done without intervention by a network
    management system.  But, are MPLS implementations and deployments
    this mature yet?  Are the backup paths precomputed -- by the routers
    themselves or by an external management system?  

  * Another possibility is dynamic recomputation of routes based on
    dynamic link weights (set based on the prevailing traffic).  This
    requires the edge router to learn about the failure, compute a new
    path, and signal this path.  In fact, a single failure may cause a
    bunch of edge routers to compute a bunch of new paths.  What kind
    of signaling load does this introduce?  Are the local decisions
    made by each router (as the system moves to the new state) good 
    for the global traffic load?

In either case, the key question is whether or not this complexity is
warranted.  I don't doubt at all that the full-fledged deployment of
MPLS opens up additional possibilities (in addition to the existing
IGP mechanisms) for more efficient failover (both in terms of routing
efficiency and convergence delay).  But, how much, and at what added
complexity?  Some analysis or experiments to quantify these trade-offs
would be really useful to help folks operating networks make these
difficult decisions about how to approach traffic engineering and
fault tolerance.

-- Jen