[zeromq-dev] behavior of zmq_term in 2.1.0

Martin Sustrik sustrik at 250bpm.com
Thu Dec 2 10:55:21 CET 2010

Hi Chuck,

> After reading through Martin's responses, I think I see his dilemma.
> It appears the dilemma is that while zmq_term() might atomically set
> a flag indicating the socket is closed and begin deallocating its
> resources, a socket may be in the midst of doing some work that is
> already beyond that atomic "gatekeeper" and so it blows up (2 threads
> touching same structs).
> He gives a short set of steps for properly terminating a program like
> so:
> 1. Thread A creates a context. 2. Threads B - Z each create a socket
> using that context. 3. Threads B - Z block on some 0mq operation. 4.
> Thread A calls zmq_term on the context. 5. Threads B - Z all wake up
> to process ETERM; this signals them to call zmq_close() on their
> socket. 6. zmq_term() unblocks after all sockets are closed. 7.
> Application terminates.

Let me try to explain the problem once more:

1. To be able to send all the messages before application terminates, 
something has to block the main thread. Otherwise it exits and tears the 
whole thing down along will all the unsent messages.

2. We don't want zmq_close to block as that could slow the actual 
processing down.

3. Thus the only function to block and wait till all messages are sent 
is zmq_term.

4. When zmq_term is called, there still may be some sockets being used 
from other threads. Thus, zmq_term has to "get" all the messages already 
written to those sockets and "stop" them is some way so that no more 
messages can be sent afterwards.

5. The above can be done either by accessing socket data directly from 
zmq_term thread or by doing a handshake between the zmq_term thread and 
the thread owning the socket.

6. In the former case, the socket data have to be placed in critical 
section. Thus, we have to introduce lock to the socket object, including 
zmq_send and zmq_recv calls -- which are on the critical path. The 
consequence is that the performance decreases by an order of magnitude.

7. The latter case is what's implemented today. zmq_term sends 'term' 
command to the socket. Socket flushes all the messages to the 
appropriate I/O thread(s), remembers that it's already terminated, so 
that it can return ETERM afterwards and sends 'term_ack' back to 
zmq_term thread.

8. The obvious problem is that to do its part of the handshake, the 
thread owning the socket has to pass control to libzmq. The only way to 
do so is by calling some libzmq function. We can possible add 
'zmq_finalise_termination_handshake()' function, however, 'zmq_close()' 
does fine as well.

> Unfortunately, this doesn't work for all cases. I added a short C
> program to my ticket 127
> (https://github.com/zeromq/zeromq2/issues/#issue/127) that
> illustrates an indefinite block. There is no chance for the socket to
> receive ETERM and call zmq_close(). All that has been done is the
> allocation of a context, a socket and a call to zmq_term().

Just close the socket before uninitialising the library. That's a pretty 
reasonable requirement IMO.

> Even in the step-by-step situation given above, some of those threads
> may be blocked on non-0mq operations and may not get around to
> calling any 0mq operations for quite some time. Until they do, the
> zmq_term() will block.
> Lastly, this behavior is problematic for asynchronous systems (e.g.
> reactor pattern). Any blocking behavior blocks *all* other
> operations. :(
> I'm at a loss to suggest a fix.
> Why does zmq_term() need to block now? Is it to support socket
> migration between threads, or to support ZMQ_LINGER?

Is it to support LINGER. Additionally, the migration functionality means 
that 0MQ can't automatically diagnose that particular socket is owned by 
the thread that's calling zmq_term (socket can be migrated at any time 
without libzmq being notified about the fact) and thus cannot use that 
as a heuristic for doing the 'handshake' in a sync manner.


More information about the zeromq-dev mailing list