[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: More confirmed-commit issues

To: Rob Enns <rpe@juniper.net>
Subject: Re: More confirmed-commit issues
From: Andy Bierman <ietf@andybierman.com>
Date: Thu, 19 May 2005 08:43:37 -0700
Cc: netconf <netconf@ops.ietf.org>
In-reply-to: <062B922B6EC55149B5A267ECE78E5D440A003995@photon.jnpr.net>
References: <062B922B6EC55149B5A267ECE78E5D440A003995@photon.jnpr.net>
User-agent: Mozilla Thunderbird 1.0 (Windows/20041206)

Rob Enns wrote:

(NEW ISSUE: What happens to a confirmed commit in progress if the session is lost or the agent reboots?)
To confirm the above issues:
If the session doing the confirmed commit is lost, the confirmed commit continues.

If the agent reboots in the middle of a confirmed commit, I assume the box boots with the new config, so an agent reboot acts like a 2nd commit. Yuch. Or does the agent remember that a revert timeout was pending? If the timer doesn't survive, and the first commit CAUSED the reboot, isn't this device in an endless reboot loop? If the crash happens in the startup sequence, before the timer can pop, it's in an endless reboot loop anyway.
I think there are 2 cases here: 1) intentional reboot -> if an operator intentionally reboots the box in the middle of the confirmed commit, I'd say that effectively confirms the commit, and we should explicitly mention this case in the protocol spec.

2) unintentional reboot (aka bug) -> we can't standardize what netconf does in this case, right?

no -- we can standardize what happens for a reboot of any reason.
I don't agree with (1) above, because an attacker without any
account access at all can effectively issue a <commit> operation
by getting the device to reboot at the right time.

IMO, the device should be required to save a boolean flag in NV-store that will tell the agent to revert to the last config before booting, which is what we decided should happen if the session that did the first commit is lost. BTW, this also helps fix the new-config-causes-a-reboot looping problem.

>....
It is astonishing to Mgr B who has no way of knowing a revert timeout is pending. To me, the whole thing is just fragile. IMO, a configuration protocol should be robust, not fragile.

I don't think it's fragile. The problem in this scenario is that Mgr A and Mgr B don't know what each other are doing. We can't standardize a way out of badly managed networks.

I totally disagree.  Managers can use locks correctly, as designed, and
config-changes can at least be properly serialized.  We have a special
case here.

IMO, this confirmed-commit is the only part of the NETCONF protocol that can cause really bad scenarios even when Mgr A and Mgr B are both using locks correctly. (Wes will now tell me otherwise :-)

No other NETCONF operation causes the agent to make "silent" changes to the
<running> config in the background. We didn't add <rollback> because of the
same serious multi-manager issues that plague confirmed-commit.

Another concern: NETCONF locks are supposed to be short-lived, but the default timeout for this operation is 10 minutes! By design, Mgr A is supposed to stay logged in and hold a lock on <running> for 10+ minutes in order for this operation to work in a multi-manager environment.

I am strongly opposed to this feature "as-is" unless we make the following changes:

1) add the "confirmed-commit-in-progress" warning I proposed (or something like it) so other managers can at least know their changes may be clobbered by the agent at time T 2) automatically revert the config if the "committing" session is lost (WG consensus for this already) 3) automatically revert the config upon startup if the agent reboots for any reason while a revert-timeout is pending

I think we should of had a "manual revert" operation all along (not a generalized rollback -- just the ability to cause the revert-timer to pop), but we don't. I suspect it will show up in various proprietary forms, and we can standardize it later.

Andy


--
to unsubscribe send a message to netconf-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/netconf/>

References:
- RE: More confirmed-commit issues
  - From: "Rob Enns" <rpe@juniper.net>

Prev by Date: Re: More confirmed-commit issues
Next by Date: access control issues
Previous by thread: RE: More confirmed-commit issues
Next by thread: RE: More confirmed-commit issues
Index(es):
- Date
- Thread