[zeromq-dev] PUB/SUB Socket Deletion & Recreation Race Condition

Tom Cocagne tom.cocagne at gmail.com
Tue Jun 5 23:37:46 CEST 2012


While attempting to unit test some code based on PUB/SUB sockets
(pyzmq 2.1.10 & zmq 2.1.11), I ran into an unfortunate race condition
that occasionally breaks the unit tests. Usually the message exchanges
work fine but occasionally the unit tests randomly break due to some
or all of the nodes completely failing to send messages.

The unit tests use 3 nodes that consist of a pair of PUB/SUB sockets
where each SUB socket is connected to all PUB sockets (including the
node's own PUB socket). The individual unit tests involve various
message exchanges between the nodes. In typical unit test fashion
(python unit testing anyway), the nodes are created immediately before
each test begins and are torn down immediately after each test
completes.

It seems that, when using 'ipc:///' on Linux at least, closing the zmq
sockets followed immediately by recreating and reconnecting them
occasionally renders the sockets unusable. No messages are received
from the PUB sockets and the connection never recovers. Inserting as
little as a 50ms delay between each test completely prevents this
problem from occurring though.

Is this a bug or expected behavior? Due to the multi-threaded nature
of the zmq backend I can see the response to this issue legitimately
being "don't do that" but I'm not a fan of littering my code with
magic "this ought-ta be long enough" delays.

Tom



More information about the zeromq-dev mailing list