[zeromq-dev] Recovery of a Subscription after a PubSub system rebooted

Pieter Hintjens ph at imatix.com
Mon Nov 12 02:34:03 CET 2012


On Mon, Nov 12, 2012 at 8:59 AM, Stefan de Konink <stefan at konink.de> wrote:

> I just dit a normal "reboot" - and I doubt the iptables based firewall
> is currently a blocker here.

Can you try with a subscriber inside the firewall? It does sound like
the firewall simply isn't recognizing that the server has "gone", so
keeps the connection to the subscriber open, which then has no reason
to reconnect.

TCP does treat a proper socket close differently from a network
disconnection (as in, unplugging a cable or perhaps rebooting a
server).

>> As Kevin points out, you can use heartbeats to discover this at the
>> subscriber side, and force a reconnect no matter what the firewall
>> state.
>
> Shouldn't the heartbeat be integrated at the subscriber code?

You mean in libzmq? Perhaps, that's an option and could be useful. But
heartbeating often needs domain knowledge to work properly, to avoid
false positives (e.g. code that works on a LAN can break on a WiFi
network). On some networks 1 second of silence is a sign of trouble;
on other configurations 30 seconds of silence may be normal.

>> Alternatively you may find a way to fix this in the firewall.
>
> I guess if I can reproduce this problem on a LAN, I can file a bugreport?

For sure. If you can reproduce without any intervening firewall, we can fix it.

A minimal test case is pretty much a prerequisite to any investigation.

-Pieter



More information about the zeromq-dev mailing list