[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: More confirmed-commit issues
Rob Enns wrote:
(NEW ISSUE: What happens to a confirmed commit in progress if the
session is lost
or the agent reboots?)
To confirm the above issues:
If the session doing the confirmed commit is lost, the
confirmed commit
continues.
If the agent reboots in the middle of a confirmed commit, I
assume the
box boots
with the new config, so an agent reboot acts like a 2nd
commit. Yuch.
Or does
the agent remember that a revert timeout was pending? If the
timer doesn't
survive, and the first commit CAUSED the reboot, isn't this
device in an
endless reboot loop? If the crash happens in the startup sequence,
before the
timer can pop, it's in an endless reboot loop anyway.
I think there are 2 cases here:
1) intentional reboot
-> if an operator intentionally reboots the box in the
middle of the confirmed commit, I'd say that effectively
confirms the commit, and we should explicitly mention
this case in the protocol spec.
2) unintentional reboot (aka bug)
-> we can't standardize what netconf does in this case, right?
no -- we can standardize what happens for a reboot of any reason.
I don't agree with (1) above, because an attacker without any
account access at all can effectively issue a <commit> operation
by getting the device to reboot at the right time.
IMO, the device should be required to save a boolean flag in NV-store
that will tell the agent to revert to the last config before booting, which
is what we decided should happen if the session that did the first commit
is lost. BTW, this also helps fix the new-config-causes-a-reboot
looping problem.
>....
It is astonishing to Mgr B who has no way of knowing a revert
timeout is pending. To me, the whole thing is just fragile.
IMO, a configuration protocol should be robust, not fragile.
I don't think it's fragile. The problem in this scenario is that
Mgr A and Mgr B don't know what each other are doing. We can't
standardize a way out of badly managed networks.
I totally disagree. Managers can use locks correctly, as designed, and
config-changes can at least be properly serialized. We have a special
case here.
IMO, this confirmed-commit is the only part of the NETCONF protocol
that can cause really bad scenarios even when Mgr A and Mgr B are both
using locks
correctly. (Wes will now tell me otherwise :-)
No other NETCONF operation causes the agent to make "silent" changes to the
<running> config in the background. We didn't add <rollback> because of the
same serious multi-manager issues that plague confirmed-commit.
Another concern:
NETCONF locks are supposed to be short-lived, but the default timeout
for this operation is 10 minutes! By design, Mgr A is supposed to stay
logged
in and hold a lock on <running> for 10+ minutes in order for this
operation to
work in a multi-manager environment.
I am strongly opposed to this feature "as-is" unless we make the
following changes:
1) add the "confirmed-commit-in-progress" warning I proposed (or something
like it) so other managers can at least know their changes may be
clobbered
by the agent at time T
2) automatically revert the config if the "committing" session is
lost (WG consensus
for this already)
3) automatically revert the config upon startup if the agent reboots
for any reason
while a revert-timeout is pending
I think we should of had a "manual revert" operation all along (not a
generalized
rollback -- just the ability to cause the revert-timer to pop), but we
don't. I suspect
it will show up in various proprietary forms, and we can standardize it
later.
Andy
--
to unsubscribe send a message to netconf-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/netconf/>