[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: More confirmed-commit issues



> Hi,
> 
> My two cents - if a session/connection breaks before
> a commit, then rollback to previous/current config
> MUST happen as a hard requirement of NetConf.

Thanks Ira. I think this is a good way to address the issue.
If the session is terminated for any reason before the commit
is confirmed, the previous config is restored immediately.

Rob

> The idea that Manager B gets (all unknowing) the
> loose ends from Manager A's previous session will
> _never_ get past IESG Security review.
> 
> Cheers,
> - Ira
> 
> Ira McDonald (Musician / Software Architect)
> Blue Roof Music / High North Inc
> PO Box 221  Grand Marais, MI  49839
> phone: +1-906-494-2434
> email: imcdonald@sharplabs.com
> 
> > -----Original Message-----
> > From: owner-netconf@ops.ietf.org 
> [mailto:owner-netconf@ops.ietf.org]On
> > Behalf Of Rob Enns
> > Sent: Wednesday, May 18, 2005 3:34 PM
> > To: Andy Bierman
> > Cc: netconf
> > Subject: RE: More confirmed-commit issues
> > 
> > 
> > > >>  (NEW ISSUE: What happens to a confirmed commit in 
> > progress if the 
> > > >>session is lost
> > > >>   or the agent reboots?)
> > > 
> > > To confirm the above issues:
> > > 
> > > If the session doing the confirmed commit is lost, the 
> > > confirmed commit 
> > > continues.
> > > 
> > > If the agent reboots in the middle of a confirmed commit, I 
> > > assume the 
> > > box boots
> > > with the new config, so an agent reboot acts like a 2nd 
> > > commit.  Yuch.  
> > > Or does
> > > the agent remember that a revert timeout was pending?  If the 
> > > timer doesn't
> > > survive, and the first commit CAUSED the reboot, isn't this 
> > > device in an
> > > endless reboot loop?  If the crash happens in the startup 
> sequence, 
> > > before the
> > > timer can pop, it's in an endless reboot loop anyway.
> > 
> > I think there are 2 cases here:
> > 1) intentional reboot
> > -> if an operator intentionally reboots the box in the
> > middle of the confirmed commit, I'd say that effectively
> > confirms the commit, and we should explicitly mention 
> > this case in the protocol spec.
> > 
> > 2) unintentional reboot (aka bug)
> > -> we can't standardize what netconf does in this case, right?
> > 
> > 
> > 
> > > >> T0 - boot with baseline config
> > > >> Tc -  Manager A issues a confirmed commit, w/ revert to 
> > > >>baseline at Tc+i
> > > >> Tc+1 - Manager A loses its connection and session
> > > >> Tc+10 - Manager B has no idea Manager A did this, comes 
> > > >>along, gets the 
> > > >>lock,
> > > >>               and starts writing to the <candidate> 
> > config, which  
> > > >>starts with  the contents
> > > >>               of <running> at time Tc
> > > >>Tc+20 - Then Manager A comes back and can't get a lock
> > > >>Tc+i   - Manager A's revert timer pops before Manager B is done
> > > >>            The agent reverts the state of <running> to 
> > T0.  (But B 
> > > >>thinks the
> > > >>            state of <running> is Tc).
> > > >>
> > > >>At this point, it depends on the difference between config T0 
> > > >>and Tc, and
> > > >>what Manager B is doing, as to whether benign or 
> > > devastating effects 
> > > >>will follow.
> > > >>
> > > >>It's never a good thing to design this much "astonishment" 
> > > >>into routing 
> > > >>products.
> > > >>At a minimum, we need to document what happens in as many 
> > > >>corner cases as
> > > >>we can think of, but we should also try to respect the 
> > principle of 
> > > >>least astonishment.
> > > >>    
> > > >>
> > > >
> > > >I don't view this as astonishing, or a side effect. It's very
> > > >simple: when the timer pops from a confirmed commit, the device
> > > >will revert to the T0 configuration. I'd argue that it's the kind
> > > >of easy to understand basic behavior that operators like.
> > > >  
> > > >
> > > It is astonishing to Mgr B who has no way of knowing a revert
> > > timeout is pending. To me, the whole thing is just fragile. 
> > > IMO, a configuration protocol should be robust, not fragile.
> > 
> > I don't think it's fragile. The problem in this scenario is that
> > Mgr A and Mgr B don't know what each other are doing. We can't
> > standardize a way out of badly managed networks.
> > 
> > > A protocol that can allow the possibility of severely detrimental
> > > config changes (through unintended or malicious acts), by merely
> > > dropping a connection, is fragile.
> > 
> > Why would reverting to the T0 configuration, which is both 
> what Mgr A
> > wanted to do, and what the device was running before, be severely
> > detrimental?
> > 
> > We can sit around making up corner cases where Mgr A and Mgr B don't
> > know what each other are doing, that's easy to do and not very
> > productive. There's no way the device ends up with a sane 
> > configuration
> > at the end of the day using _any_ configuration method, if 
> > the entities
> > doing the configuration aren't coordinated. 
> > 
> > The question is, what's the risk/reward of standardizing a feature
> > like confirmed commit. The risk is that operators that aren't aware
> > that a confirmed commit is underway could lose changes. The 
> reward is
> > that we have a standardized way to protect against devices falling
> > off the network due to a change. IMO there a very clear 
> benefit which
> > outweighs the risk. And the risk is explicitly identified in the 
> > protocol document.
> > 
> > > It's possible the security AD could have a problem with this too,
> > > during the IESG review.
> > >
> > > >>>  If a confirming commit is not issued, the device will 
> > revert it's
> > > >>>  configuration to the state prior to the issuance of 
> > the confirmed
> > > >>>  commit.  Note that any commit operation, including a 
> > commit which
> > > >>>  introduces additional changes to the configuration, will 
> > > >>>      
> > > >>>
> > > >>serve as a
> > > >>    
> > > >>
> > > >>>  confirming commit.  Thus to cancel a confirmed commit 
> > and revert
> > > >>>  changes without waiting for the confirm timeout to 
> expire, the
> > > >>>  confirming commit can explicitly restore the 
> > > configuration to it's
> > > >>>  state before the confirmed commit was issued.
> > > >>> 
> > > >>>
> > > >>>      
> > > >>>
> > > >>I don't understand this last sentence, and this revert 
> > > >>operation at all.
> > > >>    
> > > >>
> > > >
> > > >This is in reponse to your comment about how the 
> > configuration would
> > > >be reverted before the timer pops. We can't use the rollback 
> > > operation
> > > >to explain it, because netconf doesn't have one at this 
> > > point. I could
> > > >remove that text completely if it's confusing.
> > > >  
> > > >
> > > The fact that you have a rollback operation in Junoscript 
> > > doesn't really
> > > apply to this document.  The sentence doesn't convey the 
> > idea that the
> > > confirmed commit can be canceled through proprietary 
> > > mechanisms, outside
> > > the scope of the standard. 
> > 
> > Sorry for the confusion, that's not what this is saying. 
> The point is 
> > that one can use netconf as specified to restore the configuration 
> > using edit-config.
> > 
> > It has nothing to do with a proprietary mechanism.
> > 
> > > IMO, we need to remove this sentence since there is no rollback 
> > > operation in netconf.
> > > In fact, we should say instead that netconf provides no 
> > mechanism to 
> > > force the
> > > agent to cancel the confirmed commit and revert the <running> 
> > > configuration.
> > > The manager has to wait for the timeout interval to pass.
> > 
> > That's not true. Either way the cancel/revert is a manager initiated
> > action. It's a little clunky for the manager to restore the
> > configuration
> > using edit-config, but it works. This seems to be causing confusion
> > so I can remove it, but I don't think it's accurate to say 
> > that netconf
> > provides no mechanism to revert the running configuration. 
> It provides
> > edit-config, which is awkward (because the manager has to have the
> > configuration in hand) but works.
> > 
> > > BTW, the phrase "the confirming commit can explicitly restore the 
> > > configuration"
> > > doesn't really make sense. s/confirming commit/manager/ 
> and it does.
> > 
> > Yes, that sounds good, thanks.
> > 
> > Rob
> > 
> > --
> > to unsubscribe send a message to netconf-request@ops.ietf.org with
> > the word 'unsubscribe' in a single line as the message text body.
> > archive: <http://ops.ietf.org/lists/netconf/>
> > 
> 

--
to unsubscribe send a message to netconf-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/netconf/>