[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: soft state (was Re: shim6 and bit errors in data packet headers
Iljitsch van Beijnum wrote:
On 27-mei-2005, at 23:37, Erik Nordmark wrote:
I would think that if one side triggers reachability testing, the
other side would also do it. The probes in one direction can also
function as replies in the other direction, cutting down on the
number of packets exchanged.
I'm not sure we'd end up with both ends triggering reachability
testing at about the same time, because that would seem to assume
some form of periodic reachability testing even when there is no ULP
traffic. Such background chatter seems undesirable.
I'm not sure what you mean here...
I think it was the use of "trigger" in your earlier email that triggered
me to go off in what is perhaps tangential to your point.
What I'm getting at is that one end says "hey, can you still hear me?"
and then the other end says "sure, and can you still hear me?". The
first party then replies with "yes" and we know that there is
reachability in both directions.
Yes, one can verify that bidirectional reachability exists (when it does
exist) by using 3 packets instead of the naive approach which would
result in 4 packets.
But that doesn't necessarily help when reachability does not exist for
some subset of the address pairs.
Thus I think we'd want a packet driven trigger of reachability
testing (much like NUD). When A sends a ULP packet to B, it checks
whether it has current reachability information for B, and if not it
triggers reachability testing.
Disagree. I think we should assume reachability until we get a hint
that there is none. So if A sends a packet to B, B does nothing and
will 99% likely in due course send a packet back because transports
tend to work in both directions. However, if A doesn't get a reply and
the packet it sent earlier isn't one that is known to go replyless
(i.e., TCP ack-only or fin packets, A triggers reachability testing.
This is the debate about positive vs. negative advise from the ULPs. You
are advocating that the ULPs provide negative advise. But that isn't
sufficient to trigger in all cases of failures.
Take the case when the TCP on A is sending data to B, hence B is only
sending ACK packets back to A.
The TCP on A can easily generate negative advise when it has
retransmitted a few times and doesn't receive a response.
But things are problematic on B, because there isn't an (efficient)
strategy for the TCP on B to generate negative advise - it doesn't run a
Thus when something fails it will always be up to A to initiate the
exploration of alternate locator pairs. Also, the time at which the
exploration of alternates start is a function of the retransmit behavior
of the ULP, which makes it harder to tightly control the failover time.
It is easier to have the ULPs generate positive advise ("the traffic to
this destination is making forward progress") at both ends. The fact
that an ACK for new data has been recently received, or that a data
packet which advances the sequence number has been recently received,
are both easy indications of forward progress.
With such a strategy the shim implementation can do a check after
sending a ULP packet: "how long time ago since some positive advise?"
If this exceeds some limit, then the shim can trigger a test of the
current locator pair, and if that fails, start testing alternative
The positive advise approach has the benefit that it works even if the
ULPs don't generate any advise; in this case the shim would, when ULP
packets are being sent, revert to periodically sending a test packet.
As per my suggestion, there would normally not be any reachability
testing as long as packets flow in both directions. When there is a
bidirectional failure AND both ends were sending data at the same time
(= not when data is flowing in one direction and just acks in the
other), there would probably be reachability testing triggered in both
directions at the same time.
Do you consider this problematic?
Yes, because the ULP on both ends will not be able to detect that there
is a problem. The canonical example where this isn't easy is a TCP which
is receiving data packets and only sending ACKs; in that case there is
no retransmit timer running on which you can hang a "send negative
advise to the IP layer" event.