[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: High level comment on draft-li-ccamp-confirm-data-channel-status-00.txt,



Dimitri,

I think you are focusing in too closely on an example of how the problem can arise. The problem is simply that it is possible that the resources are not available in the same way at both ends of the link. The discussion then went down the path of how this might arise, and we have become stuck in a detailed example. Perhaps a better example would be the failure of a hardware component that is represents a label on the TE link, where the failure is known at the local node, but cannot be communicated to the remote node.

But related to your specific discussion...

i am not sure to capture your point - let's say states are self-refreshed
for some time, if the signaling channel is not up before pre-negotiated
time - and you didn't fix this timer as being infinite - resources will be
released when the cleanup elapses

So you are saying that there is a timer that can be set on the Hello such that if the Hello adjacency is not re-established in that time, all LSPs should be torn down. Yes. This is the Restart Cap object Restart Time.

But you say if "you didn't fix this timer as being infinite". But actually let's think about how operators like to use this timer. Sure they can say, I expect the CP to restart after a fault in 10 minutes, but wht about hardware faults? The CPU running the CP could fry, that is likely to require a truck roll, and the user traffic must not be torn down just because of this fault.

So the Restart Time is typically set quite large. You will hear many people say that the user traffic is absolute - it must never be torn down because of a CP failure - and these people set the Restart Timer to all-f.

Now, once the time has been set large, we don't have automatic cleanup. So the cross-connects can get into a mess.

in case one requires larger maintenance
windows one provides such configuration beforehand

But note that this is not planned maintenance.

mis-using one
mechanism and then require another to reconstruct consistent states is the
first point that needs to be clarified

So, are you saying that my description, above, represents a mis-use?
If so, how should I configure the Restart Time so that I can survive long CP outages?

(it is also important to clarify
that LMP does not "reserve resources" it just mediates data link states so
if the error needs to be corrected it will any interfere with the entity
that is making use of it

Completely agree with this point. LMP can report problems, but cannot fix them.

what you are trying to achieve is detect mismatch but detect these mismach
is not a solution to the problem

Agree completely.
What you have suggested before is that prevention is sufficient.
What I believe is that prevention of all cases is probably not possible, therefore detection and cure will be needed.

Detection (you are right) is not the same as cure. But cure without detection is improbable.

Note that cure is not necessarily the objective! There is a separate problem (sending Path with bad choice of label) that can be prevented after detection, without the need for cure.

Cure probably reuqires operator intervention.

but more importantly what is the trigger
to the detection itself? in brief how does one either of these sides
knows it should start such "verification" process ... was it not at the
end the reason for possibility of maintaining the "available" link on
permanent verification state within LMP ?

Understanding the trigger for the detection process is important.

Section 4 of the I-D currently says:
  The data channel status confirmation related LMP messages are sent
  between adjacent nodes periodically or driven by some events

and this is not enough. We would need to know some more about the "period" and the "events" to assess whether this is going to flood the control channel or not provide the information in a timely manner.

For example, one could use the receipt of an Acceptable Label Set object (and some parsing thereof) as a trigger event.

Cheers,
Adrian