[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Comments on reliable accounting draft (was RE: Strawman RADIUSEXT WG charter - Take Two)
Bernard Aboba <mailto:firstname.lastname@example.org> writes:
>> Is there evidence that this 'lack' is a fatal flaw?
> The draft below attempts to make the case for this work item. Could
> you review it and send your thoughts to the list?
This draft raises a number of questions, but unfortunately doesn't
answer many (if any). This is not necessarily a criticism of the draft,
however, since the questions posed, I believe, can only be answered in
the context of specific network topologies and deployments. For
example, section 2.1 says (in reference to retransmission of RADIUS
messages): "How long to wait, how many times to retry, how to fail over,
how and when to failback, is not covered by the RADIUS specification."
The draft claims that this is a 'shortcoming' of the specification(s) in
question; I would claim otherwise, but even if we accept that this
omission should be rectified, this draft doesn't do it. WRT to "how
long to wait", the draft gives a variable "T-retry" which is initialized
to an unspecified minimum value and doubled with each failure until it
reaches a similarly unspecified maximum. To me, it's difficult to
distinguish this from RFC 2865's prescription to wait "some period of
time". As I've mentioned before, I don't understand the use of an
exponential back-off or other TCP-like methods in RADIUS. Correct me if
I'm wrong, but I thought that the primary purpose of the TCP
retransmission algorithm was to ensure that packets get through the
network no matter what. In the specific case of server overload due to
network-wide reboot (which is the best argument I've heard for tweaking
the RADIUS retransmission method), the primary problem isn't that
packets are being dropped by intermediate nodes due to network
congestion (although in severe cases congestion might be a seen).
Instead, the problem is that too many RADIUS messages are getting
through the network simultaneously; in this case, network congestion
might be the RADIUS server's friend, giving it a chance to catch up!
Rather than an exponential back-off, the introduction of transmission
jitter might be a more effective strategy. In any case, there are lots
of ways to deal with the problem that would almost certainly be more
effective than a protocol-based approach. For "how many times to
retry", the answer is also unspecified. The rest of the questions are
also answered in simplistic and/or unhelpful ways. For example,
"failback" occurs after the expiration of an apparently static timer.
Since the failback operation is not based upon any indication of the
failed server's health, this could very easily result in the abandonment
of a functional server in favor of a server that is down. It seems like
some type of metric of server responsiveness could be developed instead
so that the most responsive server would always be used; a simpler
method yet might be a timer-based RADIUS "ping" to discover whether a
given server is alive.
The draft also makes a couple of assumptions that are novel, at least to
me. Is it true that RADIUS clients regularly choose proxies based upon
NAI or some other piece of authentication data? The last time I
checked, routing of RADIUS packets was the job of proxies, not clients,
but I won't claim to be familiar with the state of the art. In
addition, I was not aware that RADIUS proxies regularly implemented the
timeout and retry algorithm. If so, this seems like it would
_increase_, rather than decrease network traffic.
Hope this helps,
"They that can give up essential liberty to obtain a little temporary
safety deserve neither..."
-- Benjamin Franklin, 1759
"It is forbidden to kill; therefore all murderers are punished unless
they kill in large numbers and to the sound of trumpets."
to unsubscribe send a message to email@example.com with
the word 'unsubscribe' in a single line as the message text body.