[zeromq-dev] zeromq-dev Digest, Vol 34, Issue 72

Marcelo Cantos marcelo.cantos at gmail.com
Thu Oct 14 15:18:37 CEST 2010

On Thu, 14 Oct 2010 9:45 AM, Pieter Hintjens <ph at imatix.com> wrote:

> On Thu, Oct 14, 2010 at 9:02 AM, Martin Sustrik <sustrik at moloch.sk> wrote:
> > That would be the best solution, as the only sane place to implement
> > keepalives is OSI level 4. However, TCP specs (RFC1122) specify that
> > keepalive timeout is at least 2 hours. It's a long-known and widely
> > discussed problem with TCP. It have been solved in SCTP, but again, not
> > everybody is willing to switch to SCTP, etc.
> It's not about TCP vs. SCTP except in one case.  The use cases are:
> 1. to work around TCP's problem of not reporting disconnected peers
> 2. to distinguish silent applications from dead ones ('heartbeating')
> 3. to force intermediaries to keep connections open ('keepalive')
> You can basically solve all three use cases in one go by sending
> heartbeat messages when a connection is otherwise idle, and treating
> the lack of incoming heartbeats as a fatal error.  We discussed
> previously how to do this optimally, by sending heartbeats
> progressively more slowly during idle periods.  The recipient only
> needs to detect "sudden silence", i.e. a sharp drop, meaning something
> died.
> The most common cause of problems is blocked (looping) applications,
> which is why doing this at level 4 is useless.  The right place seems
> to be the tcp:// transport.

My apologies in advance if I've misunderstood what Pieter is saying, but the
above statement doesn't sit right with me. Blocked applications are either
effectively dead because they have entered an endless loop, or are simply
taking an awfully long time to compute the billionth digit of pi. The first
case should be treated as a bug, and doesn't warrant any effort by the
transport to deal with it. In the second case, I would prefer *not* to kill
the connection just because the application hasn't come up for air for some
time. As the application programmer, I consider un-talkative apps my fault
and my problem to deal with. All I want from the transport is for it to stay
healthy until the app is good and ready to talk, or, if staying healthy
becomes impossible, to report failure relatively quickly once the app tries
to talk. Wouldn't keepalives suffice?

On another note, I'm curious about the problems Martin cites regarding TCP
keepalives and the two-hour timeout. I'm guessing that perhaps this is just
too long, and that many real-world routers will forget the session before
then, but I'd be interested in hearing from someone who knows what they're
talking about. I also note that a two-hour keepalive seems to be the minimum
*default*, not the minimum value, though this is cold comfort on Windows,
which doesn't let individual applications override the default (I suppose
they could change the registry setting, if they're desperate).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20101015/14fbebee/attachment.htm>

More information about the zeromq-dev mailing list