[zeromq-dev] zmq_term() blocks in 2.1
Martin Sustrik
sustrik at 250bpm.com
Tue Jan 11 21:43:23 CET 2011
Hi David,
> Thanks for the reply. My difficulty is specifically these lines in
> ctx::terminate():
> if (no_sockets_notify)
>
> no_sockets_sync.wait ();
Yes. zmq_term() should be able to return EINTR and allow restart. To be
done.
> I'm using 0MQ with Matlab and the Matlab interactive loop is basically
> single threaded, so function call that blocks will freeze the entire
> environment. For example, with the way 2.1 works now, if I type:
>
> >> ctx = zmq.init;
> >> s = zmq.socket(ctx, zmq.PUB);
> >> zmq.term(ctx);
> [[ matlab prompt is hung ]]
>
> Now obviously I should have done a better job of cleaning up and closed
> the open socket. However things happen, especially when messing
> around interactively with matlab. It is not unusual for people to have
> a Matlab shell open for days at a time, so getting hung and having to
> terminate the Matlab process is very surprising and frustrating.
The idea is that the socket returns ETERM in whem zmq_term() was called.
The thread should close the socket then. That allows zmq_term() to exit.
If it was not so, zmq_term would decallocate the socket and next attempt
to use the socket would crash the application.
Another option would be to exit from zmq_term() immediately and let the
shutdown proceed in the background. In such case, it's not obvious how
to block the process from terminating before all the pending messages
are sent.
> The change I made ctx::terminate() seemed simple. Instead of calling
> no_sockets_sync.wait() I set errno to EAGAIN and return -1. (Actually
> it was more complicated then that because I need to keep the logging
> socket around until after I know all the other sockets are closed; and
> zmq_term() needed a minor change too).
>
> It seems like returning EAGAIN instead of calling wait() is no worse
> then just calling wait() inside ctx::terminate(). The context still
> hasn't been destroyed yet and all the blocked sockets have been
> interrupted. I admit I don't fully understand your previous arguments
> about why terminate needs to block, so this non-blocking approach seems
> like a reasonable option for zmq_term().
>
> Beyond that, an issue worth considering is adding a call to the ZMQ API
> for context level options (ex. get/setctxopt). That way, even if you
> never agree that zmq_term() should have the option to behave the way I
> describe, I can fairly easily add the behavior to my ZMQ build without
> breaking the binary API at a function call level. If someone uses an
> official build of ZMQ with my application I'll get an error back that
> the option I've attempted to set on the context ("non-blocking term()")
> isn't valid and I can then issue a helpful warning to the user.
If your problem is only with Ctrl+C, the EINTR+restart solution should work.
We've also discussed allowing for non-blocking zmq_term(), returning
EAGAIN and allowing for restart.
It needs some implementation work though :)
Martin
More information about the zeromq-dev
mailing list