[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: More confirmed-commit issues

To: Rob Enns <rpe@juniper.net>
Subject: Re: More confirmed-commit issues
From: Andy Bierman <ietf@andybierman.com>
Date: Wed, 18 May 2005 07:29:37 -0700
Cc: netconf <netconf@ops.ietf.org>
In-reply-to: <062B922B6EC55149B5A267ECE78E5D440A0038C4@photon.jnpr.net>
References: <062B922B6EC55149B5A267ECE78E5D440A0038C4@photon.jnpr.net>
User-agent: Mozilla Thunderbird 1.0 (Windows/20041206)

Rob Enns wrote:

Hi, comments below.

ditto

-----Original Message----- From: Andy Bierman [mailto:ietf@andybierman.com] Sent: Saturday, May 14, 2005 5:13 AM To: Rob Enns Cc: netconf Subject: Re: More confirmed-commit issues
Rob Enns wrote:
How does this replacement text sound?
----
8.4  Confirmed Commit Capability
8.4.1  Description
 The #confirmed-commit capability indicates that the server will
 support the <confirmed> and <confirm-timeout> parameters for the
 <commit> protocol operation.  See section Section 8.3 for further
 details on the <commit> operation.
A confirmed commit operation MUST be reverted if a
follow-up commit

(called the "confirming commit") is not issued within 600

seconds (10
 minutes).  The timeout period can be adjusted with the <confirm-
 timeout> element.  The confirming commit can itself include a
 <confirmed> parameter.
This last sentence is confusing to me. It makes sense if the <candidate> contains new changes and the 2nd confirmed commit starts a new "revert timeout" for these new changes.
That's the intent. I mention it here only to indicate that the confirming commit is not magic, it's a regular commit that could itself be confirmed or make additional changes.

ok

I really don't like the possible side effects from this confirmed commit, especially with our shared <candidate> and global locking. If you don't maintain the session and hold the lock throughout the entire double commit, really bad things can happen.

(NEW ISSUE: What happens to a confirmed commit in progress if the session is lost or the agent reboots?)


To confirm the above issues:

If the session doing the confirmed commit is lost, the confirmed commit continues.

If the agent reboots in the middle of a confirmed commit, I assume the box boots with the new config, so an agent reboot acts like a 2nd commit. Yuch. Or does the agent remember that a revert timeout was pending? If the timer doesn't survive, and the first commit CAUSED the reboot, isn't this device in an endless reboot loop? If the crash happens in the startup sequence, before the timer can pop, it's in an endless reboot loop anyway.

T0 - boot with baseline config Tc - Manager A issues a confirmed commit, w/ revert to baseline at Tc+i Tc+1 - Manager A loses its connection and session Tc+10 - Manager B has no idea Manager A did this, comes along, gets the lock, and starts writing to the <candidate> config, which starts with the contents of <running> at time Tc Tc+20 - Then Manager A comes back and can't get a lock Tc+i - Manager A's revert timer pops before Manager B is done The agent reverts the state of <running> to T0. (But B thinks the state of <running> is Tc).

At this point, it depends on the difference between config T0 and Tc, and what Manager B is doing, as to whether benign or devastating effects will follow.

It's never a good thing to design this much "astonishment" into routing products. At a minimum, we need to document what happens in as many corner cases as we can think of, but we should also try to respect the principle of least astonishment.

I don't view this as astonishing, or a side effect. It's very simple: when the timer pops from a confirmed commit, the device will revert to the T0 configuration. I'd argue that it's the kind of easy to understand basic behavior that operators like.

It is astonishing to Mgr B who has no way of knowing a revert timeout is pending. To me, the whole thing is just fragile. IMO, a configuration protocol should be robust, not fragile.

A protocol that can allow the possibility of severely detrimental
config changes (through unintended or malicious acts), by merely
dropping a connection, is fragile.

It's possible the security AD could have a problem with this too,
during the IESG review.

If a confirming commit is not issued, the device will revert it's configuration to the state prior to the issuance of the confirmed commit. Note that any commit operation, including a commit which introduces additional changes to the configuration, will

serve as a
 confirming commit.  Thus to cancel a confirmed commit and revert
 changes without waiting for the confirm timeout to expire, the
 confirming commit can explicitly restore the configuration to it's
 state before the confirmed commit was issued.
I don't understand this last sentence, and this revert operation at all.
This is in reponse to your comment about how the configuration would be reverted before the timer pops. We can't use the rollback operation to explain it, because netconf doesn't have one at this point. I could remove that text completely if it's confusing.

The fact that you have a rollback operation in Junoscript doesn't really apply to this document. The sentence doesn't convey the idea that the confirmed commit can be canceled through proprietary mechanisms, outside the scope of the standard.

IMO, we need to remove this sentence since there is no rollback operation in netconf. In fact, we should say instead that netconf provides no mechanism to force the agent to cancel the confirmed commit and revert the <running> configuration. The manager has to wait for the timeout interval to pass.

BTW, the phrase "the confirming commit can explicitly restore the configuration" doesn't really make sense. s/confirming commit/manager/ and it does.

BTW, s/it's/its/ in both paragraphs above.
Quite right, thanks.
Rob

Andy


--
to unsubscribe send a message to netconf-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/netconf/>

Follow-Ups:
- Re: More confirmed-commit issues
  - From: Wes Hardaker <wjhns1@hardakers.net>

References:
- RE: More confirmed-commit issues
  - From: "Rob Enns" <rpe@juniper.net>

Prev by Date: RE: More confirmed-commit issues
Next by Date: Re: [xml-dir] request review of NETCONF protocol
Previous by thread: RE: More confirmed-commit issues
Next by thread: Re: More confirmed-commit issues
Index(es):
- Date
- Thread