[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Comments on draft-li-ccamp-multinodes-gr-proc-00.txt,



Dimitri,

I understand your concern about over-engineering. There is a tendency for "protocol engineers" to sit and look for corner cases in the protocol that they might be able to write specifications for. It might be more useful if they wrote implementations and tested the current protocols.

But, I don't think Dan is guilty. His draft proposes no protocol extensions and no new procedures. In fact, it is a clarification that no new protocol elements are required - the existing work already covers all cases without any further work being necessary. I think this is a useful thing to do, but not something I would expect to spend a lot of CCAMP cycles on.

Perhaps most useful would be to have a review from the authors of the original Graceful Restart draft to confirm that the procedures have been correctly interpreted and applied.

As to whether this is a common problem, that might be a question for software stack implementers, chip manufacturers, and power supply vendors. But clearly we thought it was possible for a single control plane instance to fail.

A
----- Original Message ----- From: <Dimitri.Papadimitriou@alcatel.be>
To: "Dan Li" <danli@huawei.com>
Cc: "ccamp" <ccamp@ops.ietf.org>; "Olufemi Komolafe" <femi@dcs.gla.ac.uk>; <gjhhit@huawei.com>; <owner-ccamp@ops.ietf.org>
Sent: Sunday, October 01, 2006 3:05 PM
Subject: Re: Comments on draft-li-ccamp-multinodes-gr-proc-00.txt,


dan -

ok - do we have a sense of the probability of multiple independent nodes
failures - is that a common failure scenario operators are facing (and not
a marginal case) ?

or are we addressing common bugs affecting common equipment - in this case
this topics is outside scope of CCAMP ?

pls read between the lines i start to be seriously concerned by GMPLS
protocol over-engineering - if the set currently available technique is
still not enough for base deployment then it means the issue(s) is/are
sitting somewhere else -

thanks,
- d.




Dan Li <danli@huawei.com>
30/09/2006 07:42

       To:     Dimitri PAPADIMITRIOU/BE/ALCATEL@ALCATEL
       cc:     ccamp <ccamp@ops.ietf.org>, Olufemi Komolafe
<femi@dcs.gla.ac.uk>, gjhhit@huawei.com, owner-ccamp@ops.ietf.org
       Subject:        Re: Comments on
draft-li-ccamp-multinodes-gr-proc-00.txt,


Hi Dimitri,

There is nothing wrong with the GR draft, the mechanism described in that
draft doesn't need any remedial patch, and it is applied to this draft.
The reason we started working on this draft is try to clarify the control
plane procedures for a GMPLS network when there are multiple node
failures, and describes how full control plane state can be recovered
depending on the order in which the nodes restart.

Thanks,

Dan

----- Original Message ----- From: <Dimitri.Papadimitriou@alcatel.be>
To: "Dan Li" <danli@huawei.com>
Cc: "ccamp" <ccamp@ops.ietf.org>; "Olufemi Komolafe" <femi@dcs.gla.ac.uk>;
<gjhhit@huawei.com>; <owner-ccamp@ops.ietf.org>
Sent: Wednesday, September 27, 2006 12:50 AM
Subject: Re: Comments on draft-li-ccamp-multinodes-gr-proc-00.txt,


hi dan

yes i have a very practical issue - the document would then deal with
remedial to remedial - ... but do we have any feedback on initial
remedials ?

thanks,
- d.





Dan Li <danli@huawei.com>
Sent by: owner-ccamp@ops.ietf.org
25/09/2006 09:27

        To:     Olufemi Komolafe <femi@dcs.gla.ac.uk>, gjhhit@huawei.com
        cc:     ccamp <ccamp@ops.ietf.org>
        Subject:        Re: Comments on
draft-li-ccamp-multinodes-gr-proc-00.txt,


Hi Femi,

Thanks for your valuable suggestion!

The intention for this draft is clarify the procedures for multi-nodes
restart, so five typical scenarios have been listed in the draft, I
think
most of the failed cases are covered by these five scenarios. As you
have
pointed out, these scenarios may be raised due to multiple nodes fail,
or
a subsequent control channel failure. Yes, you're right! It may be more
generic to classify these scenarios, such as: "What should happen if a
restarting node fails to get a RecoveryPath/Path message from its
downstream/upstream neighbor?", but I think it may be more clear to
address the specific cases in the draft at least during the initial
stage.

Any comments are very welcome!

Best regards,

Dan




----- Original Message ----- From: Olufemi Komolafe
To: danli@huawei.com ; gjhhit@huawei.com
Cc: ccamp
Sent: Tuesday, September 19, 2006 8:45 PM
Subject: RE: Comments on draft-li-ccamp-multinodes-gr-proc-00.txt,


Hi,

While reading this draft it occurred to me that perhaps it might be more

useful to approach this topic from the perspective of "What can go wrong

during graceful restart, what are the consequences and how can it be
fixed?" rather than focusing on the narrower topic of multiple
simultaneous nodal faults.

For example, Scenario 1 in the draft may be interpreted as "What should
happen if a (non-ingress) restarting node fails to get a RecoveryPath
message from its downstream neighbour?", Scenario 2 is "What should
happen
if a (non-ingress) restarting node fails to get a Path message from its
upstream neighbour?" and so on.  Whether each of these scenarios arises
due to multiple simultaneous nodal faults (as in the draft) or any other

reason (e.g. a subsequent control channel failure, restarting node being

inundated with messages etc.) is, in my opinion, secondary.  I think the

key thing is to identify the potential problems and suggest appropriate
remedial actions, where the authors think existing documentation is
insufficient, rather than focusing on 5 different permutations of
multiple
node graceful restart.

Regards,
Femi





From: owner-ccamp@ops.ietf.org [mailto:owner-ccamp@ops.ietf.org] On
Behalf
Of Zafar Ali (zali)
Sent: 10 July 2006 04:04
To: danli@huawei.com; gjhhit@huawei.com
Cc: ccamp
Subject: Comments on draft-li-ccamp-multinodes-gr-proc-00.txt,

Dear Authors,

This is Deja-vu to me....

Draft draft-ietf-ccamp-rsvp-restart-ext-05.txt actually had a section on

multiple node restart case and was rejected by the WG as addressing
multiple node restart case is NOT a goal (suffers from the law of
diminishing return). In other words the following statement in the ID-

   "[GR-EXT] also extends the Hello message to exchange information
about
   the ability to support the RecoveryPath message.
   The examples and procedures in [GR-EXT] focus on the description of a

   single node restart when adjacent network nodes are operative.
   Although the procedures are equally applicable to multi-node
restarts,
   no detailed explanation is provided."

is not accurate. Please see section 4 on the earlier version of the
[GR-EXT],

http://www.faqs.org/ftp/pub/internet-drafts/draft-rahman-ccamp-rsvp-restart-extensions-00.txt

.

Thanks

Regards... Zafar