On Aug 24, 2018, at 5:02 AM, Luca Boccassi <luca.boccassi@gmail.com> wrote:

This is a very old issue that has its roots in some of the commands
needing to be processed in the application thread. The solution is to
refactor and move them to the I/O thread instead.

But it's of course much easier said than done. It's quite complex and
with many unknown ramifications.

So if anybody wants to help with that, the very first thing would be to
add a ton of unit tests around that area. We can now do internal per-
class unit tests (libzmq/unittests) so it's possible.
This will never be doable safely without tests.

By “very old” I assume you’re referring to this: https://zeromq.jira.com/browse/LIBZMQ-270.

If so, it seems unlikely that this will be addressed any time soon.

So I’m trying to explore alternatives that might alleviate some of the problems caused by this issue, short of the massive re-engineering (and associated risk) that it would apparently take to refactor the command processing.

One alternative that seems promising would be to allow connection events to “percolate” from the I/O thread back to the application thread, and cause e.g. zmq_poll to return with an indication that the socket’s state has changed — that indication could then be used to trigger the appropriate action on the application thread (which could be as simple as re-entering the zmq_poll call, thereby triggering process_commands).

The advantages of this approach are that it would be much safer than the alternative, it presumably requires much less work, and it would not change the behavior of ZeroMQ for existing applications (i.e., connection events would have to be specifically requested).

What I don’t know is whether this is even possible — unfortunately I just don’t understand the code well enough to even make a guess at this point.  If anyone can suggest how this might work in practice, I’d be very grateful.