[zeromq-dev] Recovery of a Subscription after a PubSub system rebooted

Pieter Hintjens ph at imatix.com
Mon Nov 12 03:08:43 CET 2012


On Mon, Nov 12, 2012 at 10:47 AM, Stefan de Konink <stefan at konink.de> wrote:

> I think you are over estimating the 'firewall' concept I mentioned. It is
> just a system with policy drop, and allow from a few sources, on the same
> server as the pubsub is running. But I'll strip the firewall from the
> problem and going to test it on a LAN with just one switch.

You are probably right, and it could well be that a reboot simply
isn't reported by TCP as a network failure, except after a session
timeout (30 minutes by default).

If this is what's happening then we do need to add heartbeating
somewhere. One could argue that 0MQ is just mirroring TCP behaviour
but pragmatically I think heartbeating could be useful at a lower
level. However it's non-trivial to make this work generally, and at
least needs application support to define the intervals.

Filing a bug report isn't going to help much, then, unless you're also
willing to help make the solution, or convince someone to help make
it.

What I would do if I was you is (a) experiment with application
heartbeats, which are quite simple to add, (b) establish that indeed
it's the TCP session timeout that's causing this, and (c) see if you
can convince anyone on this list that adding (I assume) optional and
configurable heartbeats in libzmq for at least pub/sub sockets would
be a good investment.

-Pieter



More information about the zeromq-dev mailing list