[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: comments on draft-ietf-shim6-failure-detection-06.txt {4}

To: "Henderson, Thomas R" <thomas.r.henderson@boeing.com>
Subject: Re: comments on draft-ietf-shim6-failure-detection-06.txt {4}
From: Jari Arkko <jari.arkko@piuha.net>
Date: Thu, 09 Nov 2006 12:39:45 -0800
Cc: shim6@psg.com
In-reply-to: <77F357662F8BFA4CA7074B0410171B6D01A2F809@XCH-NW-5V1.nw.nos.boeing.com>
References: <77F357662F8BFA4CA7074B0410171B6D01A2F809@XCH-NW-5V1.nw.nos.boeing.com>
User-agent: Thunderbird 1.5.0.7 (X11/20060922)

Many thanks for your review Thomas. Some further
discussion inline:
> I reviewed the failure detection draft today and have the following
> comments.
>
> Clarity issues
> ==============
> 1. The state machine could be clarified a bit.  There is an
> "Operational" state but associated with this state are two timers, and
> depending on which timers are active at a given time, there are actually
> three substates, which one might call "Idle" (no timers running),
> "Operational-In" (KeepAlive running), and "Operational-Out" (Send
> running).  The draft implies that Send and Keepalive are not running
> simultaneously ("Upon the reception of a payload packet in the
> Operational state, the node starts the Keepalive timer if it is not yet
> running, and stops the Send timer if it was running."); this implies
> that a bidirectional data flow will (conceptually) be toggling
> continuously between Operational-In and Operational-Out.  It could be
> better clarified that if a host moves to a quiescent state upon sending
> a single keepalive message, unless woken up by either its own send event
> or the reception of another peer data packet.     
>
> It might be worth considering to break out the Operational state into
> substates to see what the implications are for the state machine, and
> only then decide whether one wants to collapse the substates into a
> single state.  Perhaps this has already been considered and rejected
> (I'm new to this discussion). 
>   
We have had variants of the same design that had
more states, but people found this confusing. We can
indeed model the timers with states too, but I fear that
the end result is not any clearer. In some sense the
current model allows you to consider only three
states if you do not look at the timers at all; with the
substates expanded you would have to look at everything
simultaneously.

But I will clarify what the relationship between the
two timers is.
> 2.  The linkage between these failure detection mechanisms and return
> routability checks is fuzzy.  If REAP detects through probing that a
> locator pair works, does this suffice for reachability, or does separate
> shim6 process need to send Update Request?  The last paragraph of 3.3
> seems to imply that shim6 must separately verify return routability.
> But the shim6 draft says that the Probe message exchange handles this
> routability check.    This really ought to be clarified somewhere in
> this draft.
>   
I removed this text and edited the explanation in the security
considerations section to say:

"Other parts of the SHIM6 protocol ensure that the set of addresses
we are switching between actually belong together. REAP itself
provides no such assurances. Similarly, REAP provides some protection
against third party flooding attacks [ref]; when
REAP is run its Probe nonces can be used as a return routability check
that the claimed address is indeed willing to receive
traffic. However, this needs to be complemented with another mechanism
to ensure that the claimed address is also the correct host. SHIM6
does this by performing binding of all operations to context tags.
> 3. There seems to be a mismatch with the shim6-proto draft, where it
> says that there are "Probe Option", "Reachability Option", and "Payload
> Reception Report Option" TLVs defined in this draft.  I couldn't find
> any such Option definitions.
>   
Indeed. We updated the failure detection draft as a result
of comments that called for simplification of the protocol.
The shim6-proto draft needs to reflect the changes, too,
but it has not yet been updated.
> 4.   There could be more clarity in the terminology for address pairs.
> The protocol draft talks of ULID-pairs and current locator pair, while
> this draft talks about primary address pair and current address pair.
>   
ULID term is not actually needed here and it is removed.
> Second, it probably needs to be distinguished that there is an inbound
> address pair and an outbound address pair that may be the same or
> different.  I suggest to consistently use the terms "ULID-pair",
> "outbound locator pair", "inbound locator pair" and possibly to clarify
> that when "locator pair" is used without reference to inbound or
> outbound, it refers to the case where they are the same pair.
>
> 5.  In general, sections 6 and 7 seem drafty to me.  Section 7
> (examples) probably should be an appendix, and section 6 should be
> expanded into more detailed elements of procedure, such as how to fill
> in parameter values and what are the legal values, that do not require
> reading section 7 to find the details.
>   
I will move Section 7 to an appendix.

And add parameter values. The legal values are in the format
descriptions, and I don't think they should be added here.
> 6.  In section 7, examples 4 and 5, where are the keepalives preceding
> the 10 second timeout.  Why won't peer A be the one to first have a Send
> timeout in example 4?
>
>   
It just the one that happens to notice it first. I will add
text about this.
> Suggested technical change
> ==========================
> I disagree with fixing the KeepaliveTimeout to 10 seconds and making the
> possible negotiation an area for future study.  I don't see why each
> node couldn't set a parameter in the initial exchange that directed a
> peer to use a particular timeout value.   There doesn't have to be any
> negotiation; each side can configure the KeepaliveTimeout that the peer
> must use for setting Keepalive timer, and optionally include the
> parameter in the I2, R2, and possibly I2-bis packets.  Send timer can be
> aligned with the Keepalive timer interval given to the peer.  The
> default (if no such parameter is present) could be 10 seconds.  I
> suggest to define the parameter now, as well as instructing
> implementations to honor it if they receive such a parameter, and say
> that the heuristics for sending it are for further study.
I'm fine doing this either way. But there was some prior
discussion where we wanted to limit the size of the
protocol and were unsure if we ever needed to provide
a negotiation mechanism. Does anyone else care about this?
> Minor
> ====
> title of section 4.2 "Alternate Address Pair Exploration" should
> probably be "Full Reachability Exploration" to match other references in
> the text to this procedure
>
> in section 3.5, s/multible/multiple
>
> in section 3.4, the acronym "ULID" is used for the first time without
> any expansion or reference.
>
> ExploringOK state could have a name that is more indicative of the state
> of the exploration, such as InboundOK
>
> In section 6, the explanatory text for each event precedes the numbered,
> underlined event definition, and it is somewhat confusing to read
> because the numbered lines seem to delineate the sections but they in
> fact do not.
>   
All taken into account.

--Jari

Follow-Ups:
- option for setting non-default Keepalive Timeout in REAP
  - From: "Henderson, Thomas R" <thomas.r.henderson@boeing.com>

References:
- comments on draft-ietf-shim6-failure-detection-06.txt {4}
  - From: "Henderson, Thomas R" <thomas.r.henderson@boeing.com>

Prev by Date: draft-ietf-shim6-proto-06 : Sending R2 in I2-SENT state
Next by Date: Re: comments on draft-ietf-shim6-proto-05.txt
Previous by thread: comments on draft-ietf-shim6-failure-detection-06.txt {4}
Next by thread: option for setting non-default Keepalive Timeout in REAP
Index(es):
- Date
- Thread