[zeromq-dev] MDP protocol, detecting dead workers

Gyorgy Szekely hoditohod at gmail.com
Wed Feb 8 22:37:47 CET 2017


Hi,
Background:
I have a message broker written with cppzmq implementing the Majordomo
protocol. It works really fine, except for one scenario: when a worker
crashes during processing. The protocol handles this as no new task is
assigned to the dead worker, but the broker never realizes that it lost a
worker.
In my environment workers die quite often, and this is visible to the
broker: tcp link goes down. My problem is that the broker is not aware of
such events and effectively leaks worker related objects and provides false
stats on available resources (the worker reconnects as a new worker).

Question:
Is it possible get the identity of disconnected peers on a ROUTER socket
without actually sending a message?

There's a dedicated socket for workers in the broker, and there's a monitor
attached to it, which reports connection closed events, but I found no way
to associate these events with router identity. Is this intentional?
I also tired setting the ZMQ_ROUTER_MANDATORY flag, and sending a single
frame message consisting of the identity only, but it gets discarded
without ever throwing a EHOSTUNREACH error.

The only way I could come up with is to send a real (heartbeat) message to
a worker which will trigger EHOSTUNREACH for disconnected workers, but it
will queue up in busy workers. I wouldn't even consider this as a
workaround...

Any ideas solve this correctly?

Regards,
   Gyorgy Szekely
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20170208/19907f6e/attachment.htm>


More information about the zeromq-dev mailing list