[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: failure detection
some comments below...
El 15/08/2005, a las 13:36, <email@example.com> escribió:
Somewhat to my disappointment, there were no opinions about
which of the two failure detection mechanisms is better,
either during the sessions or afterward on the list.
At least for me, I'm still puzzeling over how to capture failure ...
Between 2 hosts, tcp traffic may work but not udp traffic (mostly due
to stupid middleboxes). Is this a failure from the shim point of view?
i guess we agree that the shim won't be able to track each different
ULP communication, so the shim won't be able to know if each
communication is progressing adequately by its own, right?
so, the shim can only see the exchange of packets with the other end,
and determine if packets are flowing with a given frequency. If packets
stop flowing, the shim can guess a potential failure.
Now if packets are flowing (even if packets are only flowing to a given
port and not to others) the shim by itself won't be able to detect this
I guess that the only one that can identify this problem is the ULP
So i guess we have the following situation:
- The shim will only be able to detect (by itself, without additional
information) failures when packets stop flowing
So, when there is not additional information, the case that you are
considering would not be detected by the shim
- This case can be detected when there is additional information such
as ULP feedback and perhaps some ICMP meesage
In this case, we have the following scenarios:
a) different ULP provide contradictory feedback
b) only the UDP app provides negative feedback but the shim can see
that packets are still flowing (and maybe reachability tests are
c) the SHIM receives ICMP errors but packets are still flowing.
I guess that in those scenarios thee shim can detect that something
strange in the lines that you have considered is going on, and maybe
some smart decision could be taken to deal with this... but before
that, do you agree with this description of the scenarios?
Also, different transports have different concepts of what 'failure'
is. When is a path considered failed? Classical TCP failure
of the the same packet x times)? Too high bit-error rate? I think we
run into circumstances where if we let the shim layer decide
that a path has failed, it might decide much later that the path is bad
than what the transport layer knows.
well, imho the shim needs only to define a default failure mode. I
mean, it is clear that different apps have different perception of what
a failure may be, but, since the shim is a generic layer, we need to
provide a generic definition of what a failure is. Clearly, i guess it
would be very useful to allow ULPs to provide information to the shim
about when they consider that a failure has occurred.
So, i would say that we will have:
- A generic definition of failure that is used when ULPs don't provide
- each ULP can provide feedback about when they think that a failure
For the generic failure definition, i think that the first hint that a
failure may have occurred is that packets are flowing outwards but no
incoming packets are received (for a given period T)
At this point i guess that an explicit reachability test needs to be
I think we should avoid any transport layer functions in the shim layer
(or at least as much as possible). I'm leaning more towards the
type of functionality, where the shim provides as much info as it can
to the transport layers, and let the transport layers make decisions.
agree, but do you agree that we also need to support those ULPs that
don't have this rich api? i mean, that we need to support existent ULP
without requiring modifications to benefit from the shim?
I'm also wondering if the failure detection mechanism should be a
option; I've been thinking how the shim layer would work in wireless
and I don't think I'd like to have fast heartbeating, as that would
batteries unnesscesarily. Could it be possible to have this
Well, i guess that many folks have considered that when positive
feedback from the ULP is provided, then heartbeats would be omited, is
this what you had in mind?