[zeromq-dev] Recovery of a Subscription after a PubSub system rebooted

Stefan de Konink stefan at konink.de
Mon Nov 12 03:35:28 CET 2012


Hi Pieter,

On Mon, 12 Nov 2012, Pieter Hintjens wrote:

> You are probably right, and it could well be that a reboot simply
> isn't reported by TCP as a network failure, except after a session
> timeout (30 minutes by default).

Even after a day, some of the remote subscribers didn't come back. 
Obviously I cannot control their software stack, but it seems that "30
minutes aftwards" is also not a guarantee for subscription reconnection.


> If this is what's happening then we do need to add heartbeating
> somewhere. One could argue that 0MQ is just mirroring TCP behaviour
> but pragmatically I think heartbeating could be useful at a lower
> level. However it's non-trivial to make this work generally, and at
> least needs application support to define the intervals.
>
> Filing a bug report isn't going to help much, then, unless you're also
> willing to help make the solution, or convince someone to help make
> it.

The first step to a solution is probably documenting that there is a 
use case were reconnection doesn't work. For this we need better 
experimentation which I am committing to.


> What I would do if I was you is (a) experiment with application
> heartbeats, which are quite simple to add,

Sadly 'simple' is also not so simple, we basically offer a remote pubsub 
that anyone can connect to. So here we cannot control the application that 
does so.


> (b) establish that indeed
> it's the TCP session timeout that's causing this, and (c) see if you
> can convince anyone on this list that adding (I assume) optional and
> configurable heartbeats in libzmq for at least pub/sub sockets would
> be a good investment.

I'm going forward with step b.


Stefan



More information about the zeromq-dev mailing list