[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: More confirmed-commit issues

To: Rob Enns <rpe@juniper.net>
Subject: Re: More confirmed-commit issues
From: Andy Bierman <ietf@andybierman.com>
Date: Sat, 14 May 2005 05:12:46 -0700
Cc: netconf <netconf@ops.ietf.org>
In-reply-to: <062B922B6EC55149B5A267ECE78E5D440A003703@photon.jnpr.net>
References: <062B922B6EC55149B5A267ECE78E5D440A003703@photon.jnpr.net>
User-agent: Mozilla Thunderbird 1.0 (Windows/20041206)

Rob Enns wrote:

How does this replacement text sound?
----
8.4  Confirmed Commit Capability
8.4.1  Description
  The #confirmed-commit capability indicates that the server will
  support the <confirmed> and <confirm-timeout> parameters for the
  <commit> protocol operation.  See section Section 8.3 for further
  details on the <commit> operation.
A confirmed commit operation MUST be reverted if a follow-up commit (called the "confirming commit") is not issued within 600 seconds (10 minutes). The timeout period can be adjusted with the <confirm- timeout> element. The confirming commit can itself include a <confirmed> parameter.

This last sentence is confusing to me. It makes sense if the <candidate> contains new changes and the 2nd confirmed commit starts a new "revert timeout" for these new changes.

I really don't like the possible side effects from this confirmed commit, especially with our shared <candidate> and global locking. If you don't maintain the session and hold the lock throughout the entire double commit, really bad things can happen.

(NEW ISSUE: What happens to a confirmed commit in progress if the session is lost or the agent reboots?)

T0 - boot with baseline config Tc - Manager A issues a confirmed commit, w/ revert to baseline at Tc+i Tc+1 - Manager A loses its connection and session Tc+10 - Manager B has no idea Manager A did this, comes along, gets the lock, and starts writing to the <candidate> config, which starts with the contents of <running> at time Tc Tc+20 - Then Manager A comes back and can't get a lock Tc+i - Manager A's revert timer pops before Manager B is done The agent reverts the state of <running> to T0. (But B thinks the state of <running> is Tc).

At this point, it depends on the difference between config T0 and Tc, and what Manager B is doing, as to whether benign or devastating effects will follow.

It's never a good thing to design this much "astonishment" into routing products. At a minimum, we need to document what happens in as many corner cases as we can think of, but we should also try to respect the principle of least astonishment.

If a confirming commit is not issued, the device will revert it's configuration to the state prior to the issuance of the confirmed commit. Note that any commit operation, including a commit which introduces additional changes to the configuration, will serve as a confirming commit. Thus to cancel a confirmed commit and revert changes without waiting for the confirm timeout to expire, the confirming commit can explicitly restore the configuration to it's state before the confirmed commit was issued.

I don't understand this last sentence, and this revert operation at all.
BTW, s/it's/its/ in both paragraphs above.

  For shared configurations, this feature can cause other configuration
  changes (for example, via other NETCONF sessions) to be inadvertently
  altered or removed, unless the configuration locking feature is used
  (in other words, lock obtained before the edit-config operation is
  started).  Therefore, it is strongly suggested that in order to use
  this feature with shared configuration databases, configuration
  locking should also be used.

8.4.2  Dependencies

  The #confirmed-commit capability is only relevant if the #candidate
  capability is also supported.

8.4.3  Capability and Namespace

  The #confirmed-commit capability is identified by the following
  capability string:

     urn:ietf:params:xml:ns:netconf:base:1.0#confirmed-commit

  The #confirmed-commit capability uses the base NETCONF namespace URN.

8.4.4  New Operations

  None.

8.4.5  Modifications to Existing Operations

8.4.5.1  <commit>

  The #confirmed-commit capability allows 2 additional parameters to
  the <commit> operation.

  Parameters:

     confirmed:

           Perform a confirmed commit operation.

     confirm-timeout:

           Timeout period for confirmed commit, in seconds.  If
           unspecified, the confirm timeout defaults to 600 seconds.

  Example:

    <rpc message-id="101"
         xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
      <commit>
        <confirmed/>
        <confirm-timeout>120</confirm-timeout>
      </commit>
    </rpc>

    <rpc-reply message-id="101"
         xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
      <ok/>
    </rpc-reply>

-----Original Message----- From: owner-netconf@ops.ietf.org [mailto:owner-netconf@ops.ietf.org] On Behalf Of Andy Bierman Sent: Sunday, May 08, 2005 10:24 AM To: netconf Subject: More confirmed-commit issues
Hi,
IMO, PROT, section 8.4 is not very clear what happens
if locking is not used, or if the manager doesn't follow
the elements of procedure that the document suggests.
If a confirmed-commit timeout is pending, and the <candidate>
config is modified again before the 2nd <commit> or the timeout
occurs, how does the agent interpret the <commit> that is intended
to be for the newly modified <candidate>?  What exactly is the the
contents of <running> after the confirm-commit timer pops?
What if the 2nd commit is also a confirmed-commit?  What if
time(C2) < timer(C1)? How come a manager cannot cancel a confirmed
commit  (after commit-1 but before the timeout)?
Note that this corner-case can occur naturally if locking is not properly used, or pathologically, if the manager holding the locks writes to the <candidate> before finishing the first confirmed commit. (E.g., operator forgets a line of config -- adds it -- commits it.)

The vague warning about "use locks properly" (8.4.1, para 2) is not relevant to agent implementers who have to make this work even if locking isn't used, used wrong, or the manager doesn't follow the implied transaction model.

I would also like to note that the #rollback feature was thrown out because of these same corner-cases, that (IMO) are neither explained or properly handled in the current draft as they relate to the #candidate and #confirmed-commit capabilities.
Andy
--
to unsubscribe send a message to netconf-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/netconf/>

--
to unsubscribe send a message to netconf-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/netconf/>

Follow-Ups:
- Re: More confirmed-commit issues
  - From: Wes Hardaker <wjhns1@hardakers.net>

References:
- RE: More confirmed-commit issues
  - From: "Rob Enns" <rpe@juniper.net>

Prev by Date: RE: More confirmed-commit issues
Next by Date: Re: More confirmed-commit issues
Previous by thread: RE: More confirmed-commit issues
Next by thread: Re: More confirmed-commit issues
Index(es):
- Date
- Thread