[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: failure detection

To: <john.loughney@nokia.com>, <iljitsch@muada.com>, <shim6@psg.com>
Subject: RE: failure detection
From: Geoff Huston <gih@apnic.net>
Date: Fri, 19 Aug 2005 10:07:12 +1000
In-reply-to: <1AA39B75171A7144A73216AED1D7478D9EF00A@esebe100.NOE.Nokia. com>
References: <1AA39B75171A7144A73216AED1D7478D9EF00A@esebe100.NOE.Nokia.com>

At 09:36 PM 15/08/2005, john.loughney@nokia.com wrote:

Iljitsch,


>Somewhat to my disappointment, there were no opinions about
>which of the two failure detection mechanisms is better,
>either during the sessions or afterward on the list.

At least for me, I'm still puzzeling over how to capture failure ...
Between 2 hosts, tcp traffic may work but not udp traffic (mostly due
to stupid middleboxes). Is this a failure from the shim point of view?


Two potential cases:

a) the SHIM layer is conduction its own probe / heartbeat on the current locator pair and there is no vertical signalling taking place. The locator change will be triggered by a fail in the heartbeat (the heartbeat may be either explicit, or may be derived from existing traffic, or a heatbeat may kick in in the absence of existing traffic)

b) the SHIM layer is responsive to ULP signals. If the UDP session signals a locator failure then the Shim layer would undertake a locator change (** if you want to include dynamic shim sate forking in this model it may well be that the shim state would fork at this point, and the forked state that retained association with the UDP traffic may undertake the locator switch)

Also, different transports have different concepts of what 'failure'
is. When is a path considered failed? Classical TCP failure (retransmission
of the the same packet x times)?  Too high bit-error rate?  I think we might
run into circumstances where if we let the shim layer decide conclusively
that a path has failed, it might decide much later that the path is bad
than what the transport layer knows.


Again this depends on the model:

In model a) (no vertical signalling) there is no ULP information to use - in which case the SHIM trigger needs to be based on shim state information, derived from a heatbeat. i.e. failure is heartbeat failure

In model b) the shim layer should not care what conditions exist to cause the ULP to think that a change is appropriate - the ULP makes the call and then generates the signal to the shim layer.

I think we should avoid any transport layer functions in the shim layer
(or at least as much as possible).  I'm leaning more towards the rich-api
type of functionality, where the shim provides as much info as it can
to the transport layers, and let the transport layers make decisions.


precisely.

I'm also wondering if the failure detection mechanism should be a pluggable option; I've been thinking how the shim layer would work in wireless environments, and I don't think I'd like to have fast heartbeating, as that would drain batteries unnesscesarily. Could it be possible to have this functionality as optional?

I would think so - i.e. in a rich API with forkable shim states a ULP should be able to say to the SHIM state: "don't heatbeat for me - I'll generate explicit triggers"

   Geoff

References:
- RE: failure detection
  - From: <john.loughney@nokia.com>

Prev by Date: Re: shim-aware transports
Next by Date: Re: Thoughts about layering multi-addressing
Previous by thread: Re: failure detection
Next by thread: shim-aware transports
Index(es):
- Date
- Thread