[zeromq-dev] MDP protocol, detecting dead workers

Doron Somech somdoron at gmail.com
Mon Feb 13 15:32:10 CET 2017


Why not implementing heartbeat from worker to server, when worker is not
alive for X seconds consider him dead and remove it?

Also if you later get a heartbeat from a worker you don't recognize send
him an error message and the worker should relogin.

On Wed, Feb 8, 2017 at 11:37 PM, Gyorgy Szekely <hoditohod at gmail.com> wrote:

> Hi,
> Background:
> I have a message broker written with cppzmq implementing the Majordomo
> protocol. It works really fine, except for one scenario: when a worker
> crashes during processing. The protocol handles this as no new task is
> assigned to the dead worker, but the broker never realizes that it lost a
> worker.
> In my environment workers die quite often, and this is visible to the
> broker: tcp link goes down. My problem is that the broker is not aware of
> such events and effectively leaks worker related objects and provides false
> stats on available resources (the worker reconnects as a new worker).
>
> Question:
> Is it possible get the identity of disconnected peers on a ROUTER socket
> without actually sending a message?
>
> There's a dedicated socket for workers in the broker, and there's a
> monitor attached to it, which reports connection closed events, but I found
> no way to associate these events with router identity. Is this intentional?
> I also tired setting the ZMQ_ROUTER_MANDATORY flag, and sending a single
> frame message consisting of the identity only, but it gets discarded
> without ever throwing a EHOSTUNREACH error.
>
> The only way I could come up with is to send a real (heartbeat) message to
> a worker which will trigger EHOSTUNREACH for disconnected workers, but it
> will queue up in busy workers. I wouldn't even consider this as a
> workaround...
>
> Any ideas solve this correctly?
>
> Regards,
>    Gyorgy Szekely
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20170213/433c8ba3/attachment.htm>


More information about the zeromq-dev mailing list