[zeromq-dev] Monitor sockets crossing threads?

David Turner dct25-561bs at mythic-beasts.com
Thu May 28 17:56:27 CEST 2015


Hi,

I'm quite new to ZeroMQ and I seem to be doing something daft but I
can't work out what. I've got a Haskell program that uses the
zeromq4-haskell library (which binds to libzmq) and it's behaving
nondeterministically - sometimes working, sometimes hanging and
sometimes dumping core with an assertion failure within libzmq. I
strongly suspect a threading problem!

I've opened an issue against the haskell library as I suspected it was
that (or at least, my misusing that)
[https://github.com/twittner/zeromq-haskell/issues/67]. That issue
includes an example program and some execution traces. Haskell's
threads don't always map directly onto OS threads, but I believe I've
done the necessary incantations to make them do so in this case.

However, I've been doing more digging and come across something
unexpected that seems to be to do with how monitor sockets work.
Ignoring the Haskell stuff, I'm calling zmq_socket to create a socket,
then zmq_socket_monitor to set up monitoring, then finally calling
zmq_connect. They all happen on the same thread, but the connect
doesn't seem to go through immediately (the monitor receives
ZMQ_EVENT_CONNECT_DELAYED before ZMQ_EVENT_CONNECTED). However, and
here's the unexpected bit, the calls to zmq_sendmsg sending those
monitor events happen on a different thread from the one on which
zmq_socket and zmq_bind were called within zmq_socket_monitor. They
seem to be called on a thread internal to libzmq (it's a thread that
was started by zmq::thread_t::start), even though the calls to
zmq_socket/zmq_sockedt_monitor/zmq_connect all happened on a thread
owned by Haskell's runtime.

There's a few other bits in the trace where there are calls to
zmq_sendmsg to send data to a monitor from more than one thread. I've
even got one trace where there's a call to zmq_close concurrently with
another to zmq_sendmsg on the same monitor socket. This seems bad.

Is this kind of cross-thread access OK despite the warning in the
manual, perhaps because the sending end of a monitor pair is special?
I imagine so, as otherwise I wouldn't be the first to notice this. Any
idea what else that program is doing to be nondeterministic? If I
remove the monitoring code, it seems to work reliably, although
'seeming to work' is not very satisfactory for concurrent programs!

Can anyone help?

Many thanks,

David



More information about the zeromq-dev mailing list