[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: More confirmed-commit issues



Rob Enns wrote:

How does this replacement text sound?

----
8.4  Confirmed Commit Capability

8.4.1  Description

  The #confirmed-commit capability indicates that the server will
  support the <confirmed> and <confirm-timeout> parameters for the
  <commit> protocol operation.  See section Section 8.3 for further
  details on the <commit> operation.

A confirmed commit operation MUST be reverted if a follow-up commit
(called the "confirming commit") is not issued within 600 seconds (10
minutes). The timeout period can be adjusted with the <confirm-
timeout> element. The confirming commit can itself include a
<confirmed> parameter.


This last sentence is confusing to me. It makes sense if the <candidate> contains
new changes and the 2nd confirmed commit starts a new "revert timeout" for these
new changes.


I really don't like the possible side effects from this confirmed commit, especially with our
shared <candidate> and global locking. If you don't maintain the session and hold the
lock throughout the entire double commit, really bad things can happen.


(NEW ISSUE: What happens to a confirmed commit in progress if the session is lost
or the agent reboots?)


T0 - boot with baseline config
Tc - Manager A issues a confirmed commit, w/ revert to baseline at Tc+i
Tc+1 - Manager A loses its connection and session
Tc+10 - Manager B has no idea Manager A did this, comes along, gets the lock,
and starts writing to the <candidate> config, which starts with the contents
of <running> at time Tc
Tc+20 - Then Manager A comes back and can't get a lock
Tc+i - Manager A's revert timer pops before Manager B is done
The agent reverts the state of <running> to T0. (But B thinks the
state of <running> is Tc).


At this point, it depends on the difference between config T0 and Tc, and
what Manager B is doing, as to whether benign or devastating effects will follow.


It's never a good thing to design this much "astonishment" into routing products.
At a minimum, we need to document what happens in as many corner cases as
we can think of, but we should also try to respect the principle of least astonishment.


If a confirming commit is not issued, the device will revert it's
configuration to the state prior to the issuance of the confirmed
commit. Note that any commit operation, including a commit which
introduces additional changes to the configuration, will serve as a
confirming commit. Thus to cancel a confirmed commit and revert
changes without waiting for the confirm timeout to expire, the
confirming commit can explicitly restore the configuration to it's
state before the confirmed commit was issued.


I don't understand this last sentence, and this revert operation at all.
BTW, s/it's/its/ in both paragraphs above.

  For shared configurations, this feature can cause other configuration
  changes (for example, via other NETCONF sessions) to be inadvertently
  altered or removed, unless the configuration locking feature is used
  (in other words, lock obtained before the edit-config operation is
  started).  Therefore, it is strongly suggested that in order to use
  this feature with shared configuration databases, configuration
  locking should also be used.

8.4.2  Dependencies

  The #confirmed-commit capability is only relevant if the #candidate
  capability is also supported.

8.4.3  Capability and Namespace

  The #confirmed-commit capability is identified by the following
  capability string:

     urn:ietf:params:xml:ns:netconf:base:1.0#confirmed-commit

  The #confirmed-commit capability uses the base NETCONF namespace URN.

8.4.4  New Operations

  None.

8.4.5  Modifications to Existing Operations

8.4.5.1  <commit>

  The #confirmed-commit capability allows 2 additional parameters to
  the <commit> operation.

  Parameters:

     confirmed:

           Perform a confirmed commit operation.

     confirm-timeout:

           Timeout period for confirmed commit, in seconds.  If
           unspecified, the confirm timeout defaults to 600 seconds.

  Example:

    <rpc message-id="101"
         xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
      <commit>
        <confirmed/>
        <confirm-timeout>120</confirm-timeout>
      </commit>
    </rpc>

    <rpc-reply message-id="101"
         xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
      <ok/>
    </rpc-reply>




-----Original Message-----
From: owner-netconf@ops.ietf.org [mailto:owner-netconf@ops.ietf.org] On Behalf Of Andy Bierman
Sent: Sunday, May 08, 2005 10:24 AM
To: netconf
Subject: More confirmed-commit issues


Hi,

IMO, PROT, section 8.4 is not very clear what happens
if locking is not used, or if the manager doesn't follow
the elements of procedure that the document suggests.

If a confirmed-commit timeout is pending, and the <candidate>
config is modified again before the 2nd <commit> or the timeout
occurs, how does the agent interpret the <commit> that is intended
to be for the newly modified <candidate>?  What exactly is the the
contents of <running> after the confirm-commit timer pops?
What if the 2nd commit is also a confirmed-commit?  What if
time(C2) < timer(C1)? How come a manager cannot cancel a confirmed
commit  (after commit-1 but before the timeout)?

Note that this corner-case can occur naturally if locking is not properly used,
or pathologically, if the manager holding the locks writes to the <candidate>
before finishing the first confirmed commit. (E.g., operator forgets a line
of config -- adds it -- commits it.)


The vague warning about "use locks properly" (8.4.1, para 2) is not relevant
to agent implementers who have to make this work even if locking isn't used,
used wrong, or the manager doesn't follow the implied transaction model.


I would also like to note that the #rollback feature was thrown out because
of these same corner-cases, that (IMO) are neither explained or properly handled
in the current draft as they relate to the #candidate and #confirmed-commit
capabilities.


Andy










-- to unsubscribe send a message to netconf-request@ops.ietf.org with the word 'unsubscribe' in a single line as the message text body. archive: <http://ops.ietf.org/lists/netconf/>









--
to unsubscribe send a message to netconf-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/netconf/>