[zeromq-dev] zmq_term() blocks in 2.1

Martin Sustrik sustrik at 250bpm.com
Tue Jan 11 21:43:23 CET 2011


Hi David,

> Thanks for the reply.  My difficulty is specifically these lines in
> ctx::terminate():
> if (no_sockets_notify)
>
> no_sockets_sync.wait ();

Yes. zmq_term() should be able to return EINTR and allow restart. To be 
done.

> I'm using 0MQ with Matlab and the Matlab interactive loop is basically
> single threaded, so function call that blocks will freeze the entire
> environment.  For example, with the way 2.1 works now, if I type:
>
>  >> ctx = zmq.init;
>  >> s = zmq.socket(ctx, zmq.PUB);
>  >> zmq.term(ctx);
> [[ matlab prompt is hung ]]
>
> Now obviously I should have done a better job of cleaning up and closed
> the open socket.  However things happen, especially when messing
> around interactively with matlab.  It is not unusual for people to have
> a Matlab shell open for days at a time, so getting hung and having to
> terminate the Matlab process is very surprising and frustrating.

The idea is that the socket returns ETERM in whem zmq_term() was called. 
The thread should close the socket then. That allows zmq_term() to exit. 
If it was not so, zmq_term would decallocate the socket and next attempt 
to use the socket would crash the application.

Another option would be to exit from zmq_term() immediately and let the 
shutdown proceed in the background. In such case, it's not obvious how 
to block the process from terminating before all the pending messages 
are sent.

> The change I made ctx::terminate() seemed simple.  Instead of calling
> no_sockets_sync.wait() I set errno to EAGAIN and return -1.  (Actually
> it was more complicated then that because I need to keep the logging
> socket around until after I know all the other sockets are closed; and
> zmq_term() needed a minor change too).
>
> It seems like returning EAGAIN instead of calling wait() is no worse
> then just calling wait() inside ctx::terminate().  The context still
> hasn't been destroyed yet and all the blocked sockets have been
> interrupted.  I admit I don't fully understand your previous arguments
> about why terminate needs to block, so this non-blocking approach seems
> like a reasonable option for zmq_term().
>
> Beyond that, an issue worth considering is adding a call to the ZMQ API
> for context level options (ex. get/setctxopt).   That way, even if you
> never agree that zmq_term() should have the option to behave the way I
> describe, I can fairly easily add the behavior to my ZMQ build without
> breaking the binary API at a function call level. If someone uses an
> official build of ZMQ with my application I'll get an error back that
> the option I've attempted to set on the context ("non-blocking term()")
> isn't valid and I can then issue a helpful warning to the user.

If your problem is only with Ctrl+C, the EINTR+restart solution should work.

We've also discussed allowing for non-blocking zmq_term(), returning 
EAGAIN and allowing for restart.

It needs some implementation work though :)

Martin



More information about the zeromq-dev mailing list