[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: More confirmed-commit issues



Hi, comments below. 

> -----Original Message-----
> From: Andy Bierman [mailto:ietf@andybierman.com] 
> Sent: Saturday, May 14, 2005 5:13 AM
> To: Rob Enns
> Cc: netconf
> Subject: Re: More confirmed-commit issues
> 
> Rob Enns wrote:
> 
> >How does this replacement text sound?
> >
> >----
> >8.4  Confirmed Commit Capability
> >
> >8.4.1  Description
> >
> >   The #confirmed-commit capability indicates that the server will
> >   support the <confirmed> and <confirm-timeout> parameters for the
> >   <commit> protocol operation.  See section Section 8.3 for further
> >   details on the <commit> operation.
> >
> >   A confirmed commit operation MUST be reverted if a 
> follow-up commit
> >   (called the "confirming commit") is not issued within 600 
> seconds (10
> >   minutes).  The timeout period can be adjusted with the <confirm-
> >   timeout> element.  The confirming commit can itself include a
> >   <confirmed> parameter.
> >  
> >
> This last sentence is confusing to me.  It makes sense if the 
> <candidate> contains
> new changes and the 2nd confirmed commit starts a new "revert 
> timeout" 
> for these
> new changes.  

That's the intent. I mention it here only to indicate that the
confirming commit is not magic, it's a regular commit that could
itself be confirmed or make additional changes.

> I really don't like the possible side effects from this confirmed 
> commit, especially with our
> shared <candidate> and global locking.  If you don't maintain the 
> session and hold the
> lock throughout the entire double commit, really bad things 
> can happen. 
> 
>   (NEW ISSUE: What happens to a confirmed commit in progress if the 
> session is lost
>    or the agent reboots?)
> 
>  T0 - boot with baseline config
>  Tc -  Manager A issues a confirmed commit, w/ revert to 
> baseline at Tc+i
>  Tc+1 - Manager A loses its connection and session
>  Tc+10 - Manager B has no idea Manager A did this, comes 
> along, gets the 
> lock,
>                and starts writing to the <candidate> config, which  
> starts with  the contents
>                of <running> at time Tc
> Tc+20 - Then Manager A comes back and can't get a lock
> Tc+i   - Manager A's revert timer pops before Manager B is done
>             The agent reverts the state of <running> to T0.  (But B 
> thinks the
>             state of <running> is Tc).
> 
> At this point, it depends on the difference between config T0 
> and Tc, and
> what Manager B is doing, as to whether benign or devastating effects 
> will follow.
> 
> It's never a good thing to design this much "astonishment" 
> into routing 
> products.
> At a minimum, we need to document what happens in as many 
> corner cases as
> we can think of, but we should also try to respect the principle of 
> least astonishment.

I don't view this as astonishing, or a side effect. It's very
simple: when the timer pops from a confirmed commit, the device
will revert to the T0 configuration. I'd argue that it's the kind
of easy to understand basic behavior that operators like.


> >   If a confirming commit is not issued, the device will revert it's
> >   configuration to the state prior to the issuance of the confirmed
> >   commit.  Note that any commit operation, including a commit which
> >   introduces additional changes to the configuration, will 
> serve as a
> >   confirming commit.  Thus to cancel a confirmed commit and revert
> >   changes without waiting for the confirm timeout to expire, the
> >   confirming commit can explicitly restore the configuration to it's
> >   state before the confirmed commit was issued.
> >  
> >
> I don't understand this last sentence, and this revert 
> operation at all.

This is in reponse to your comment about how the configuration would
be reverted before the timer pops. We can't use the rollback operation
to explain it, because netconf doesn't have one at this point. I could
remove that text completely if it's confusing.

> BTW, s/it's/its/ in both paragraphs above.

Quite right, thanks.

Rob

> >   For shared configurations, this feature can cause other 
> configuration
> >   changes (for example, via other NETCONF sessions) to be 
> inadvertently
> >   altered or removed, unless the configuration locking 
> feature is used
> >   (in other words, lock obtained before the edit-config operation is
> >   started).  Therefore, it is strongly suggested that in 
> order to use
> >   this feature with shared configuration databases, configuration
> >   locking should also be used.
> >
> >8.4.2  Dependencies
> >
> >   The #confirmed-commit capability is only relevant if the 
> #candidate
> >   capability is also supported.
> >
> >8.4.3  Capability and Namespace
> >
> >   The #confirmed-commit capability is identified by the following
> >   capability string:
> >
> >      urn:ietf:params:xml:ns:netconf:base:1.0#confirmed-commit
> >
> >   The #confirmed-commit capability uses the base NETCONF 
> namespace URN.
> >
> >8.4.4  New Operations
> >
> >   None.
> >
> >8.4.5  Modifications to Existing Operations
> >
> >8.4.5.1  <commit>
> >
> >   The #confirmed-commit capability allows 2 additional parameters to
> >   the <commit> operation.
> >
> >   Parameters:
> >
> >      confirmed:
> >
> >            Perform a confirmed commit operation.
> >
> >      confirm-timeout:
> >
> >            Timeout period for confirmed commit, in seconds.  If
> >            unspecified, the confirm timeout defaults to 600 seconds.
> >
> >   Example:
> >
> >     <rpc message-id="101"
> >          xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
> >       <commit>
> >         <confirmed/>
> >         <confirm-timeout>120</confirm-timeout>
> >       </commit>
> >     </rpc>
> >
> >     <rpc-reply message-id="101"
> >          xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
> >       <ok/>
> >     </rpc-reply>
> > 
> >
> >  
> >
> >>-----Original Message-----
> >>From: owner-netconf@ops.ietf.org 
> >>[mailto:owner-netconf@ops.ietf.org] On Behalf Of Andy Bierman
> >>Sent: Sunday, May 08, 2005 10:24 AM
> >>To: netconf
> >>Subject: More confirmed-commit issues
> >>
> >>Hi,
> >>
> >>IMO, PROT, section 8.4 is not very clear what happens
> >>if locking is not used, or if the manager doesn't follow
> >>the elements of procedure that the document suggests.
> >>
> >>If a confirmed-commit timeout is pending, and the <candidate>
> >>config is modified again before the 2nd <commit> or the timeout
> >>occurs, how does the agent interpret the <commit> that is intended
> >>to be for the newly modified <candidate>?  What exactly is the the
> >>contents of <running> after the confirm-commit timer pops?
> >>What if the 2nd commit is also a confirmed-commit?  What if
> >>time(C2) < timer(C1)? How come a manager cannot cancel a confirmed
> >>commit  (after commit-1 but before the timeout)?
> >>
> >>Note that this corner-case can occur naturally if locking is not 
> >>properly used,
> >>or pathologically, if the manager holding the locks writes to the 
> >><candidate>
> >>before finishing the first confirmed commit.  (E.g., operator 
> >>forgets a line
> >>of config -- adds it -- commits it.)
> >>
> >>The vague warning about "use locks properly" (8.4.1, para 2) is not 
> >>relevant
> >>to agent implementers who have to make this work even if 
> >>locking isn't 
> >>used,
> >>used wrong, or the manager doesn't follow the implied 
> >>transaction model.
> >>
> >>I would also like to note that the #rollback feature was 
> >>thrown out because
> >>of these same corner-cases, that (IMO) are neither explained 
> >>or properly 
> >>handled
> >>in the current draft as they relate to the #candidate and  
> >>#confirmed-commit
> >>capabilities.
> >>
> >>Andy
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>--
> >>to unsubscribe send a message to netconf-request@ops.ietf.org with
> >>the word 'unsubscribe' in a single line as the message text body.
> >>archive: <http://ops.ietf.org/lists/netconf/>
> >>
> >>    
> >>
> >
> >
> >  
> >
> 

--
to unsubscribe send a message to netconf-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/netconf/>