[zeromq-dev] race condition in libzmq on zmq_errno()
Chuck Remes
cremes.devlist at mac.com
Fri Oct 14 22:40:21 CEST 2011
I think a race condition inherent to the setup of one errno per context makes sharing a single context amongst several threads, each with their own socket, very problematic.
I was testing a new release of my reactor library. I noticed that a call to zmq_getsockopt() with ZM_RCVMORE will sometimes set errno to 2 (No such file or directory) even though the return code from the call is 0 (it succeeds). Other threads that had created their own sockets were also calling zmq_recv() with ZM_NOBLOCK; when no more messages were available, the return code is set to -1 and errno is supposed to be set to EAGAIN.
There's a race where sometimes that EAGAIN in errno is overwritten by zmq_getsockopt() with a 2, so a subsequent error check after a call to zmq_recv() was throwing out errors. For all I know, it isn't really writing a 2, but I'm seeing that because some other thread is racing to write something to errno at the same time.
The only way I could get everything to run cleanly was to make sure each thread had its own context.
Since zmq_errno() really only makes sense on a per socket basis, shouldn't the library maintain a separate copy of errno for each socket? Doing so would avoid races between threads where different sockets are writing/overwriting errno.
I can imagine an API change where calls to zmq_errno take a socket argument.
e.g. int zmq_errno(void *socket);
Am I crazy, did I find a bug, or what?
cr
More information about the zeromq-dev
mailing list