[zeromq-dev] behavior of zmq_term in 2.1.0
Chuck Remes
cremes.devlist at mac.com
Sun Nov 28 20:37:01 CET 2010
On Nov 28, 2010, at 10:27 AM, Pieter Hintjens wrote:
> On Sat, Nov 27, 2010 at 10:53 PM, Chuck Remes <cremes.devlist at mac.com> wrote:
>
>> Wait a second...
>
> Chuck, there is a long thread (search for 'issue 85') which tries to
> dissect this problem. Martin likes to answer with puzzles like "it
> requires that ZMQ become part of the kernel" that take time to figure
> out. :-)
Wow, that thread is from almost two months ago. It may as well be from a 100 years ago! :)
After reading through Martin's responses, I think I see his dilemma. It appears the dilemma is that while zmq_term() might atomically set a flag indicating the socket is closed and begin deallocating its resources, a socket may be in the midst of doing some work that is already beyond that atomic "gatekeeper" and so it blows up (2 threads touching same structs).
He gives a short set of steps for properly terminating a program like so:
1. Thread A creates a context.
2. Threads B - Z each create a socket using that context.
3. Threads B - Z block on some 0mq operation.
4. Thread A calls zmq_term on the context.
5. Threads B - Z all wake up to process ETERM; this signals them to call zmq_close() on their socket.
6. zmq_term() unblocks after all sockets are closed.
7. Application terminates.
Unfortunately, this doesn't work for all cases. I added a short C program to my ticket 127 (https://github.com/zeromq/zeromq2/issues/#issue/127) that illustrates an indefinite block. There is no chance for the socket to receive ETERM and call zmq_close(). All that has been done is the allocation of a context, a socket and a call to zmq_term().
Even in the step-by-step situation given above, some of those threads may be blocked on non-0mq operations and may not get around to calling any 0mq operations for quite some time. Until they do, the zmq_term() will block.
Lastly, this behavior is problematic for asynchronous systems (e.g. reactor pattern). Any blocking behavior blocks *all* other operations. :(
I'm at a loss to suggest a fix.
Why does zmq_term() need to block now? Is it to support socket migration between threads, or to support ZMQ_LINGER?
cr
More information about the zeromq-dev
mailing list