[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: More confirmed-commit issues
> Hi,
>
> My two cents - if a session/connection breaks before
> a commit, then rollback to previous/current config
> MUST happen as a hard requirement of NetConf.
Thanks Ira. I think this is a good way to address the issue.
If the session is terminated for any reason before the commit
is confirmed, the previous config is restored immediately.
Rob
> The idea that Manager B gets (all unknowing) the
> loose ends from Manager A's previous session will
> _never_ get past IESG Security review.
>
> Cheers,
> - Ira
>
> Ira McDonald (Musician / Software Architect)
> Blue Roof Music / High North Inc
> PO Box 221 Grand Marais, MI 49839
> phone: +1-906-494-2434
> email: imcdonald@sharplabs.com
>
> > -----Original Message-----
> > From: owner-netconf@ops.ietf.org
> [mailto:owner-netconf@ops.ietf.org]On
> > Behalf Of Rob Enns
> > Sent: Wednesday, May 18, 2005 3:34 PM
> > To: Andy Bierman
> > Cc: netconf
> > Subject: RE: More confirmed-commit issues
> >
> >
> > > >> (NEW ISSUE: What happens to a confirmed commit in
> > progress if the
> > > >>session is lost
> > > >> or the agent reboots?)
> > >
> > > To confirm the above issues:
> > >
> > > If the session doing the confirmed commit is lost, the
> > > confirmed commit
> > > continues.
> > >
> > > If the agent reboots in the middle of a confirmed commit, I
> > > assume the
> > > box boots
> > > with the new config, so an agent reboot acts like a 2nd
> > > commit. Yuch.
> > > Or does
> > > the agent remember that a revert timeout was pending? If the
> > > timer doesn't
> > > survive, and the first commit CAUSED the reboot, isn't this
> > > device in an
> > > endless reboot loop? If the crash happens in the startup
> sequence,
> > > before the
> > > timer can pop, it's in an endless reboot loop anyway.
> >
> > I think there are 2 cases here:
> > 1) intentional reboot
> > -> if an operator intentionally reboots the box in the
> > middle of the confirmed commit, I'd say that effectively
> > confirms the commit, and we should explicitly mention
> > this case in the protocol spec.
> >
> > 2) unintentional reboot (aka bug)
> > -> we can't standardize what netconf does in this case, right?
> >
> >
> >
> > > >> T0 - boot with baseline config
> > > >> Tc - Manager A issues a confirmed commit, w/ revert to
> > > >>baseline at Tc+i
> > > >> Tc+1 - Manager A loses its connection and session
> > > >> Tc+10 - Manager B has no idea Manager A did this, comes
> > > >>along, gets the
> > > >>lock,
> > > >> and starts writing to the <candidate>
> > config, which
> > > >>starts with the contents
> > > >> of <running> at time Tc
> > > >>Tc+20 - Then Manager A comes back and can't get a lock
> > > >>Tc+i - Manager A's revert timer pops before Manager B is done
> > > >> The agent reverts the state of <running> to
> > T0. (But B
> > > >>thinks the
> > > >> state of <running> is Tc).
> > > >>
> > > >>At this point, it depends on the difference between config T0
> > > >>and Tc, and
> > > >>what Manager B is doing, as to whether benign or
> > > devastating effects
> > > >>will follow.
> > > >>
> > > >>It's never a good thing to design this much "astonishment"
> > > >>into routing
> > > >>products.
> > > >>At a minimum, we need to document what happens in as many
> > > >>corner cases as
> > > >>we can think of, but we should also try to respect the
> > principle of
> > > >>least astonishment.
> > > >>
> > > >>
> > > >
> > > >I don't view this as astonishing, or a side effect. It's very
> > > >simple: when the timer pops from a confirmed commit, the device
> > > >will revert to the T0 configuration. I'd argue that it's the kind
> > > >of easy to understand basic behavior that operators like.
> > > >
> > > >
> > > It is astonishing to Mgr B who has no way of knowing a revert
> > > timeout is pending. To me, the whole thing is just fragile.
> > > IMO, a configuration protocol should be robust, not fragile.
> >
> > I don't think it's fragile. The problem in this scenario is that
> > Mgr A and Mgr B don't know what each other are doing. We can't
> > standardize a way out of badly managed networks.
> >
> > > A protocol that can allow the possibility of severely detrimental
> > > config changes (through unintended or malicious acts), by merely
> > > dropping a connection, is fragile.
> >
> > Why would reverting to the T0 configuration, which is both
> what Mgr A
> > wanted to do, and what the device was running before, be severely
> > detrimental?
> >
> > We can sit around making up corner cases where Mgr A and Mgr B don't
> > know what each other are doing, that's easy to do and not very
> > productive. There's no way the device ends up with a sane
> > configuration
> > at the end of the day using _any_ configuration method, if
> > the entities
> > doing the configuration aren't coordinated.
> >
> > The question is, what's the risk/reward of standardizing a feature
> > like confirmed commit. The risk is that operators that aren't aware
> > that a confirmed commit is underway could lose changes. The
> reward is
> > that we have a standardized way to protect against devices falling
> > off the network due to a change. IMO there a very clear
> benefit which
> > outweighs the risk. And the risk is explicitly identified in the
> > protocol document.
> >
> > > It's possible the security AD could have a problem with this too,
> > > during the IESG review.
> > >
> > > >>> If a confirming commit is not issued, the device will
> > revert it's
> > > >>> configuration to the state prior to the issuance of
> > the confirmed
> > > >>> commit. Note that any commit operation, including a
> > commit which
> > > >>> introduces additional changes to the configuration, will
> > > >>>
> > > >>>
> > > >>serve as a
> > > >>
> > > >>
> > > >>> confirming commit. Thus to cancel a confirmed commit
> > and revert
> > > >>> changes without waiting for the confirm timeout to
> expire, the
> > > >>> confirming commit can explicitly restore the
> > > configuration to it's
> > > >>> state before the confirmed commit was issued.
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>I don't understand this last sentence, and this revert
> > > >>operation at all.
> > > >>
> > > >>
> > > >
> > > >This is in reponse to your comment about how the
> > configuration would
> > > >be reverted before the timer pops. We can't use the rollback
> > > operation
> > > >to explain it, because netconf doesn't have one at this
> > > point. I could
> > > >remove that text completely if it's confusing.
> > > >
> > > >
> > > The fact that you have a rollback operation in Junoscript
> > > doesn't really
> > > apply to this document. The sentence doesn't convey the
> > idea that the
> > > confirmed commit can be canceled through proprietary
> > > mechanisms, outside
> > > the scope of the standard.
> >
> > Sorry for the confusion, that's not what this is saying.
> The point is
> > that one can use netconf as specified to restore the configuration
> > using edit-config.
> >
> > It has nothing to do with a proprietary mechanism.
> >
> > > IMO, we need to remove this sentence since there is no rollback
> > > operation in netconf.
> > > In fact, we should say instead that netconf provides no
> > mechanism to
> > > force the
> > > agent to cancel the confirmed commit and revert the <running>
> > > configuration.
> > > The manager has to wait for the timeout interval to pass.
> >
> > That's not true. Either way the cancel/revert is a manager initiated
> > action. It's a little clunky for the manager to restore the
> > configuration
> > using edit-config, but it works. This seems to be causing confusion
> > so I can remove it, but I don't think it's accurate to say
> > that netconf
> > provides no mechanism to revert the running configuration.
> It provides
> > edit-config, which is awkward (because the manager has to have the
> > configuration in hand) but works.
> >
> > > BTW, the phrase "the confirming commit can explicitly restore the
> > > configuration"
> > > doesn't really make sense. s/confirming commit/manager/
> and it does.
> >
> > Yes, that sounds good, thanks.
> >
> > Rob
> >
> > --
> > to unsubscribe send a message to netconf-request@ops.ietf.org with
> > the word 'unsubscribe' in a single line as the message text body.
> > archive: <http://ops.ietf.org/lists/netconf/>
> >
>
--
to unsubscribe send a message to netconf-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/netconf/>