[zeromq-dev] behavior of zmq_term in 2.1.0

Pieter Hintjens ph at imatix.com
Thu Dec 2 18:02:21 CET 2010


Martin,

If sockets are not in a critical section, can zmq_term not perform
single-bit changes safely anyhow?

-Pieter
On 2 Dec 2010 10:55, "Martin Sustrik" <sustrik at 250bpm.com> wrote:
> Hi Chuck,
>
>> After reading through Martin's responses, I think I see his dilemma.
>> It appears the dilemma is that while zmq_term() might atomically set
>> a flag indicating the socket is closed and begin deallocating its
>> resources, a socket may be in the midst of doing some work that is
>> already beyond that atomic "gatekeeper" and so it blows up (2 threads
>> touching same structs).
>>
>> He gives a short set of steps for properly terminating a program like
>> so:
>>
>> 1. Thread A creates a context. 2. Threads B - Z each create a socket
>> using that context. 3. Threads B - Z block on some 0mq operation. 4.
>> Thread A calls zmq_term on the context. 5. Threads B - Z all wake up
>> to process ETERM; this signals them to call zmq_close() on their
>> socket. 6. zmq_term() unblocks after all sockets are closed. 7.
>> Application terminates.
>
> Let me try to explain the problem once more:
>
> 1. To be able to send all the messages before application terminates,
> something has to block the main thread. Otherwise it exits and tears the
> whole thing down along will all the unsent messages.
>
> 2. We don't want zmq_close to block as that could slow the actual
> processing down.
>
> 3. Thus the only function to block and wait till all messages are sent
> is zmq_term.
>
> 4. When zmq_term is called, there still may be some sockets being used
> from other threads. Thus, zmq_term has to "get" all the messages already
> written to those sockets and "stop" them is some way so that no more
> messages can be sent afterwards.
>
> 5. The above can be done either by accessing socket data directly from
> zmq_term thread or by doing a handshake between the zmq_term thread and
> the thread owning the socket.
>
> 6. In the former case, the socket data have to be placed in critical
> section. Thus, we have to introduce lock to the socket object, including
> zmq_send and zmq_recv calls -- which are on the critical path. The
> consequence is that the performance decreases by an order of magnitude.
>
> 7. The latter case is what's implemented today. zmq_term sends 'term'
> command to the socket. Socket flushes all the messages to the
> appropriate I/O thread(s), remembers that it's already terminated, so
> that it can return ETERM afterwards and sends 'term_ack' back to
> zmq_term thread.
>
> 8. The obvious problem is that to do its part of the handshake, the
> thread owning the socket has to pass control to libzmq. The only way to
> do so is by calling some libzmq function. We can possible add
> 'zmq_finalise_termination_handshake()' function, however, 'zmq_close()'
> does fine as well.
>
>> Unfortunately, this doesn't work for all cases. I added a short C
>> program to my ticket 127
>> (https://github.com/zeromq/zeromq2/issues/#issue/127) that
>> illustrates an indefinite block. There is no chance for the socket to
>> receive ETERM and call zmq_close(). All that has been done is the
>> allocation of a context, a socket and a call to zmq_term().
>
> Just close the socket before uninitialising the library. That's a pretty
> reasonable requirement IMO.
>
>> Even in the step-by-step situation given above, some of those threads
>> may be blocked on non-0mq operations and may not get around to
>> calling any 0mq operations for quite some time. Until they do, the
>> zmq_term() will block.
>>
>> Lastly, this behavior is problematic for asynchronous systems (e.g.
>> reactor pattern). Any blocking behavior blocks *all* other
>> operations. :(
>>
>> I'm at a loss to suggest a fix.
>>
>> Why does zmq_term() need to block now? Is it to support socket
>> migration between threads, or to support ZMQ_LINGER?
>
> Is it to support LINGER. Additionally, the migration functionality means
> that 0MQ can't automatically diagnose that particular socket is owned by
> the thread that's calling zmq_term (socket can be migrated at any time
> without libzmq being notified about the fact) and thus cannot use that
> as a heuristic for doing the 'handshake' in a sync manner.
>
> Martin
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20101202/af6778ba/attachment.htm>


More information about the zeromq-dev mailing list