[zeromq-dev] Assertion failure in mailbox.cpp

Jon Gjengset jon at thesquareplanet.com
Mon May 19 15:22:28 CEST 2014


Hi all!

I'm fairly new to ØMQ, but have read most of The Guide and think I
understand how my particular problem should be built on top of ØMQ.
I'm building a multithreaded application with staged threads, similar to
the "Signaling Between Threads (PAIR Sockets)" chapter in The Guide.

My application flow is something like this:

  - Main thread creates context, stores in global variable
  - Main thread spawns one thread for each "stage"
  - Each stage is defined in its own thread, with a static void * for
    its "incoming" socket, and another for its "outgoing" socket.
  - Each thread first *connects* to its incoming socket, then *binds* to
    its outgoing socket
  - Each thread then enters a while(1) loop calling zmq_recv(incoming,
    ...), doing some work on the received data, and then calling
    zmq_send(outgoing, ...).

When the application terminates, the main threads sets a boolean to
false, causing the first stage to terminate, send a TERMINATE message on
its outgoing queue and then close its outgoing socket. Upon receiving a
TERMINATE, each stage will break from its infinite loop, close its
incoming socket, send a TERMINATE to its outgoing socket, and finally
close its outgoing socket.

As far as I can tell, none of this violates The Rule when it comes to
multithreading, as each socket is only ever touched by the thread that
created it. Nevertheless, my application will sometimes (although not
always), crash with an assertion failure at mailbox.cpp:82:

	#0  0x00007ffff6727d67 in raise () from /usr/lib/libc.so.6
	#1  0x00007ffff6729118 in abort () from /usr/lib/libc.so.6
	#2  0x00007ffff7913409 in zmq::zmq_abort (errmsg_=errmsg_ at entry=0x7ffff79741ce "ok") at err.cpp:74
	#3  0x00007ffff7919c52 in zmq::mailbox_t::recv (this=this at entry=0x7fffe4007768, cmd_=cmd_ at entry=0x7ffff0c10790, timeout_=timeout_ at entry=0) at mailbox.cpp:82
	#4  0x00007ffff7933c79 in zmq::socket_base_t::process_commands (this=this at entry=0x7fffe40073b0, timeout_=timeout_ at entry=0, throttle_=throttle_ at entry=false) at socket_base.cpp:982
	#5  0x00007ffff7937b5d in zmq::socket_base_t::in_event (this=0x7fffe40073b0) at socket_base.cpp:1098
	#6  0x00007ffff791305e in zmq::epoll_t::loop (this=0x7fffec002b70) at epoll.cpp:165
	#7  0x00007ffff79436da in thread_routine (arg_=0x7fffec002be0) at thread.cpp:81
	#8  0x00007ffff6aa9124 in start_thread () from /usr/lib/libpthread.so.0
	#9  0x00007ffff67dd4bd in clone () from /usr/lib/libc.so.6

and sometimes at signaler.cpp:232

	#0  0x00007ffff6727d67 in raise () from /usr/lib/libc.so.6
	#1  0x00007ffff6729118 in abort () from /usr/lib/libc.so.6
	#2  0x00007ffff7913409 in zmq::zmq_abort (errmsg_=errmsg_ at entry=0x7ffff7976231 "dummy == 1") at err.cpp:74
	#3  0x00007ffff7932dfd in zmq::signaler_t::recv (this=this at entry=0x7fffe0000cd8) at signaler.cpp:232
	#4  0x00007ffff7919bbc in zmq::mailbox_t::recv (this=this at entry=0x7fffe0000c78, cmd_=cmd_ at entry=0x7fffeb7fd2f0, timeout_=timeout_ at entry=0) at mailbox.cpp:68
	#5  0x00007ffff7933c79 in zmq::socket_base_t::process_commands (this=this at entry=0x7fffe00008c0, timeout_=timeout_ at entry=0, throttle_=throttle_ at entry=false) at socket_base.cpp:982
	#6  0x00007ffff793431c in zmq::socket_base_t::recv (this=this at entry=0x7fffe00008c0, msg_=msg_ at entry=0x7fffeb7fd390, flags_=flags_ at entry=0) at socket_base.cpp:842
	#7  0x00007ffff794bd49 in s_recvmsg (s_=s_ at entry=0x7fffe00008c0, msg_=msg_ at entry=0x7fffeb7fd390, flags_=flags_ at entry=0) at zmq.cpp:446
	#8  0x00007ffff794bdf1 in zmq_recv (s_=0x7fffe00008c0, buf_=0x447a60 <_ZL5group>, len_=8, flags_=0) at zmq.cpp:470
	#9  0x000000000042997d in stage4_main (arg=<optimized out>) at stage4.cpp:60
	#10 0x00007ffff6aa9124 in start_thread () from /usr/lib/libpthread.so.0
	#11 0x00007ffff67dd4bd in clone () from /usr/lib/libc.so.6

I have created a minimal test case with the same application flow here:
http://pastebin.com/mNFNHdyh, but I cannot make it fail in the same way.

The problem happens with both 4.0.4 and master on the development branch.
I've also tried replacing ZMQ_PAIR with ZMQ_PULL/ZMQ_PUSH, but to no avail.
I'm running Arch Linux x64 and compiling using clang++ v3.4.1.
For what it's worth, Valgrind reports no memory errors in the application.

I realize it might be hard to determine what the problem is without
seeing the full code, but I am hoping I might at least get some pointers
about what kind of situation might cause the two assertions above
inparticular to fail.

Cheers,
Jon



More information about the zeromq-dev mailing list