[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: review of draft-ietf-shim6-failure-detection-03.txt

To: Iljitsch van Beijnum <iljitsch@muada.com>
Subject: Re: review of draft-ietf-shim6-failure-detection-03.txt
From: marcelo bagnulo braun <marcelo@it.uc3m.es>
Date: Thu, 22 Jun 2006 17:59:23 +0300
Cc: Jari Arkko <jari.arkko@piuha.net>, shim6-wg <shim6@psg.com>
In-reply-to: <099F9743-9BD2-43B8-B53F-A1644D1D81C9@muada.com>
References: <615BD9B54CB01B41838C323DB9E91AAB4075EB@esebe100.NOE.Nokia.com> <446DC7E4.3090501@piuha.net> <F9F7CB0B-567C-4CB0-9775-63098CEBBD22@muada.com> <4497FBEA.7020908@piuha.net> <099F9743-9BD2-43B8-B53F-A1644D1D81C9@muada.com>

Hi Iljitsch,

some comments below...

El 21/06/2006, a las 13:13, Iljitsch van Beijnum escribió:

On 20-jun-2006, at 15:45, Jari Arkko wrote:
I'm fairly critical of the current version but I don't want to  waste
time with an in-depth review if there's going to be a new  version
within a week anyway.
Don't hold back - fire away if you have known problems. You saw
the discussion with John; I'm addressing those comments but will
not rewrite other parts. So it could well be that your comments
are still valid even for -04.
Ok. First of all, have a look at my message from october 24th, somecomments are still valid.
This draft works per-context exclusively. So if there are 2, 5 or 10contexts between two hosts, this means 2, 5 or 10 times the amount ofwork is done.

i agree that this would be a nice feature. the problem with this is howdo you identify the peer in such a way that you can probe all theexisting contexts. I mean, as of today, the probe/keepalive messageshave the context tag, so the peer can identify the context thanks tothe context tag. the drawback as you say is that we need as manypackets as etablished contexts.

The other option would be to use a single probe/keepalive for all thecontexts between two peers. In order to do that we need a mean toidentify the peer so that the receiver of the packet can identify allthe contexts corresponding to the same hosts and apply the receivedpacket to all the contexts.BAsically this would introduce the notion of endpoint in the shimcontext/protocol (which is not present today), since today thegranularity is ulid pairs (as oposed to endpoint pairs)

in order to do that we need to choose a namespace for identifying theendpoints and include the endpoint name in the context/contextestablishment messages.

this would be a considerable change in the protocol i guess, but may beexplored if people deem it relevant.

As a general comment, i am kind of worried about the complexity of theresulting protocol, including shim protoc and the failure detectionprotocol and i would really preffer to try to simplify the protocolrather than making it more complex, even if this means loosing someoptimization for some cases. I am concerned about having a complexprotocol that may become error prone (we already have feedbackexpressing this concern BTW)

I think it makes sense to see whether the mechanism can be made to bemore general, so that it can be used for other applications as well,ranging from MIP (as indicated in the applicability draft) to tunnelmaintenance and maybe even peer-to-peer applications.

i fully agree with this, but i am not sure why do you think this is notthe case already...

However, it's important that there is fate sharing between thereachability protocol and the user protocol (shim in our case). Ithink this can be solved by having the quick reachability verificationstuff (= FBD) encapsulated in the user protocol, but let the full pathexploration be a protocol of its own or live under ICMPv6 or somesuch.

not sure why do you think this is needed. Defining the protocolmessages in a way that they can be included in the shim6 header as wellas in the mobility header or the hip header would be good enough toallow using the failure detection protocol in other protocols.... whatam i missing?

However, this introduces some issues with knowing whether the samecorrespondent lives at different addresses.
In any event, I think just ignoring the possibility that there aremultiple contexts between two hosts isn't good enough. The issue ofavoiding "reachability probe storms" is somewhat related to this, andalso not addressed in the draft.
Another thing that's missing completely from this draft is adiscussion of how to use address pair preference information. Thismakes it impossible to address traffic engineering needs.

well, i have been working on this and i have submitted a draft abouthow to perform locator pair selection, including reachabilityinformation and also preference information from the shim protocol


you can find it at:

http://www.ietf.org/internet-drafts/draft-ietf-shim6-locator-pair-selection-00.txt


of course your feedback would be very welcome

I believe section 3 (related work) is at best unnecessary and may evenconfuse the reader. The same is true of much of section 4(definitions). Implementers know which addresses are usable and whicharen't, at least for the most part, so covering this in such detail issuperfluous. Things like "Both types of operational pairs could bediscovered and monitored through the following mechanisms:" have noplace in a protocol specification document, in my opinion. These needto specify what IS, not what COULD be.


i somehow disagree...

i think that the definition section is very useful, because the insightit provides about the different states of an address and address pairsare very important. I mean, it makes several very good points, likethat an address is not reachable, but an address pair is and the like

with respect to the related work, i agree that this could be moved tothe end of the document (perhaps an appendix)

I suggest tightening the use of words like "operational", "work","reachable". They're mostly used interchangably in the draft.


i don't think this is the case.
i find this differences relevant imho

4.5.  Miscellaneous

   Addresses can become deprecated [3].  When other operational
   addresses exist, nodes generally wish to move their communications
   away from the deprecated addresses.

   Similarly, IPv6 source address selection [6] may guide the selection
   of a particular source address - destination address pair.
This doesn't say what shim6 implementers should do. In my opinion:keep using deprecated addresses as the ULID/primary locator as long aspossible, but prefer non-deprecated addresses when selectingalternative locators.


i think this should belong to the locator selection document...

"A "path" is defined as" should probably be in the "definitions"section.


"when link layer" -> when a link layer

   1.  The base timing unit for this mechanism is named Keepalive
       Timeout.  Until a negotiation mechanism to negotiate different
       values for this timer becomes available, the value (3 seconds)
       specified in Section 6.5 SHOULD be used.

Sending a keepalive every 3 seconds? That's way too often, IMO.

   2.  Whenever outgoing data packets are generated

Data packets as opposed to what other types of packets?


signalling packets, such as keeplives or probes (is my understanding)

   4.  The reception of a REAP keepalive packet leads to stopping the
       timer associated with the return traffic from the peer.
So when we receive a keepalive from the other side, _we_ stop sendingkeepalives?

as i understand it, this means that we are not expecting another packet(until we send a new packet, of course)

This may be the right thing to do, but it's not obvious to me why.Some explanation would help.
What if an attacker generates spurious keepalives in order to get usto stop sending keepalives on our side?
   6.  Send Timeout seconds (10 s; see Section 6.5) after the
       transmission of a data packet with no return traffic on this
       context, a full reachability exploration is started.  This
       timeout period is larger than the Keepalive Timeout to
       accommodate for lost keepalives and regular variations in round
       trip times.
The keepalives are sent at an interval of 3 seconds (or shorter, Iimagine that an implementation isn't going to keep an exact timer foreach context, any rounding must obviously be in the down direction)and the timeout is 10 seconds. In these 10 seconds you'd normallyreceive 3 keepalives, while 1 is enough to indicate that the otherside is still alive. The other 2 are only there in case of packetloss. I think that's excessive.


would you suggest it to reduce it to 2 packets every 10 secs?

I mean, i think this protocol will require quite a lot of fine tunningbased on experience and simulations of the load... i guess that what'sin the current spec are resonable values for the time being (i have noproblem with changing them a bit, but as i said i guess in depth finetunning will be needed once we have more experience...)

Starting the full reachability exploration because of incidentalpacket loss isn't such a big deal that it warrants sending three timesas many packets as necessary. (Note that this assumes the worst caseof a unidirectional transport protocol which isn't exactly the commoncase.)
I find it jarring that section 5.5 immediately jumps to examples,before the protocol is fleshed out. Although the examples are referredby using a number, they aren't actually numbered.
Why would a keepalive need an id field?
In the first long example B times out once, but then doesn't send moreprobes until it sees return traffic from A. Why does B stop sendingprobes?
I believe that since the id of the last received probe is included,the iseeyou flag is unnecessary.


you mean that if the id field is empty, this means iseeyou=no?

Although copying back the last seen id seems to do the job, I can'thelp but feel that it would be preferable to add timers to reach roundtrip times and copy back more received ids and also sent ids. Thisallows the receiver of a probe to determine which of the probes thatmade it to the other side did so faster, so it can select the addresspair with the shortest round trip time.

i would suggest to leave this for future work, since it is addedcomplexity and it is not obvious to me that selecting the fastest oneis always the best choice.... (e.g. bandwidth is not considered)

Including sent ids along with the addresses the probe with that id wassent to helps the receiver determine that some probes didn't make it(yet). If a probe didn't work in one direction of an address pair,it's reasonable to assume that it may also not work in the otherdirection and try other pairs first.


yes, this could be added to the locator selection mechanism

I believe the next example (the fifth, I think?) is flawed because Adoesn't send anything after it has sent a data packet, even though itnever receives a return packet. The same is true for the example afterthat.
When does address pair exploration conclude, both in the cases wherethere is alternative reachability, and in the case when there appearsto be no reachability at all? The exponential backoff isn't described.
The keepalive is a fairly long packet. I think just a shim header aswould be used for data packets but with no ULP following the shimheader would be sufficient.

not sure what would you omit from the current packet format... i mean,we need the context tag and the identifier and we need it to make itextensible in the header.... the only additional filed there is thechecksum... are you suggesting we remove that or is there other fieldsthat you think can be removed?

Requiring random numbers in packets that are sent rather frequently isa bad idea, because it depletes the typically limited amount ofentropy that's available for strong random number generation ratherquickly and semi-random number generation may be somewhat expensive(and not that good). And I don't see what good an id does in akeepalive anyway... Also, there may be reasons to have non-randomnumbers, such as ease of lookup.

i guess this i neeeded to indeed verify that the reply was generated asa response to the initial packet, this is needed in the case that thisis used as a security mechanism to prevent 3 party flooding attacks iguess.

The description of the different messages repeats text that is thesame for different messages, which is very tedious and makes it likelythat people will miss the things that are actually different. (That'swhy programmer's keyboards don't have ctrl-c/ctrl-v keys: reuse code,don't copy it.)
The use of options for mandatory fields is awkward.

"   The node maintains also the Send and Keepalive timers."
These timers were previously unnamed, it would be better to use theirnames as soon as they're introduced.
" Upon the reception of a payload packet in the Operational state,the
   node starts the Keepalive timer if it is not yet running, and stops
   the Send timer if it was running.  If the node is in the Exploring
   state it transitions to the ExploringOK state, sends a Probe message
   with the I See You flag set to 1 (Yes), and starts the Send timer.
   In the ExploringOK state the node stops the Send timer if it was
   running, but does not do anything else."
I don't have a good feeling about this... It's too hard to determinewhat should be happening. Maybe it would be better rather than go downthe list of packets that are sent/received and describe the behaviorin each state, to take one state at a time and describe what happenswith packets in that state.



that would be the state machine i guess, right?

   Upon a timeout on the Keepalive timer the node sends a Keepalive
   message.  This can only happen in the Operational state.

Why?
Why are there different Exploring and ExploringOK states? In bothcases, the host needs to continue trying different addresses, it'smostly/only the other side that needs to behave differently when thereis successful reachability from them to us but not (yet) from us tothem.
   Garbage collection of SHIM6 contexts terminates contexts that are
   either unused or have failed due to the inability of the exploration
   process to find a working pair.

How is the latter determined?
This is an important issue. For instance, once that a shim context isrehomed, will it ever return to using the primary locators?
   In the PDF version of this specification, an informational drawing
   illustrates the state machine.  Where the text and the drawing
   differ, the text takes precedence.

I'm not reviewing the PDF state machine here as the text is normative.

   A tabular representation of the state machine is shown below.  Like
   the drawing, this representation is only informational.

Then I'm ignoring this too.
But I would be happier if they'd be removed, because either they'resuperfluous as they're not normative, or they're actually necessary tounderstand the protocol, which is even worse because they're not partof the normative text.

i think state machines are very useful to understand how the protocolworks and to verify that it is working and i think these should beincluded in the docuemnts


Reagrds, marcelo

Follow-Ups:
- Re: review of draft-ietf-shim6-failure-detection-03.txt
  - From: Iljitsch van Beijnum <iljitsch@muada.com>

References:
- review of draft-ietf-shim6-failure-detection-03.txt
  - From: <john.loughney@nokia.com>
- Re: review of draft-ietf-shim6-failure-detection-03.txt
  - From: Jari Arkko <jari.arkko@piuha.net>
- Re: review of draft-ietf-shim6-failure-detection-03.txt
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
- Re: review of draft-ietf-shim6-failure-detection-03.txt
  - From: Jari Arkko <jari.arkko@piuha.net>
- Re: review of draft-ietf-shim6-failure-detection-03.txt
  - From: Iljitsch van Beijnum <iljitsch@muada.com>

Prev by Date: Re: I-D ACTION:draft-ietf-shim6-applicability-01.txt
Next by Date: Re: I-D ACTION:draft-van-beijnum-shim6-suppress-header-00.txt
Previous by thread: Re: review of draft-ietf-shim6-failure-detection-03.txt
Next by thread: Re: review of draft-ietf-shim6-failure-detection-03.txt
Index(es):
- Date
- Thread