[zeromq-dev] zmq_abort called from clean_pipes - is this a bug?

Stephen Lord Steve.Lord at quantum.com
Mon Oct 29 19:07:34 CET 2012



I managed to make the zeromq library (3.2.1-rc2) crash due to a programming
error from outside the library.

I know someone is going to ask for a small test case which may take me a little
while to come up with, so for now a problem description.

The library crashed like this:

#0  0x000000304b6328a5 in raise () from /lib64/libc.so.6
#1  0x000000304b634085 in abort () from /lib64/libc.so.6
#2  0x00007f19ce0e8dea in zmq::zmq_abort (
    errmsg_=0x7f19ce114f4c "!incomplete_in") at err.cpp:76
#3  0x00007f19ce0fc802 in zmq::session_base_t::clean_pipes (
    this=0x7f1958000b30) at session_base.cpp:227
#4  0x00007f19ce0fcf25 in zmq::session_base_t::detach (this=0x7f1958000b30)
    at session_base.cpp:335
#5  0x00007f19ce1049b7 in zmq::stream_engine_t::error (this=0x7f19580008e0)
    at stream_engine.cpp:451
#6  0x00007f19ce1039a9 in zmq::stream_engine_t::in_event (this=0x7f19580008e0)
    at stream_engine.cpp:232
#7  0x00007f19ce0e7f19 in zmq::epoll_t::loop (this=0x7f1980003430)
    at epoll.cpp:153
#8  0x00007f19ce0e809c in zmq::epoll_t::worker_routine (arg_=0x7f1980003430)
    at epoll.cpp:174


The basic architecture is a set of PUSH sockets connected to a single PULL socket
via inproc, which forwards the multipart messages they send on to a PUB socket 
connected via ipc. I was incorrectly passing ZMQ_SENDMORE on a message
segment on the PUSH socket. This caused a process reading on the SUB socket
to try to interpret what was the subscription key for a new message as a segment
of the previous message and crash.

Simple to fix my use case once I found this, however, in cleaning up the socket
after the SUB process crashed, the sending process also died because the
clean_pipes code could not cope with the incomplete_in variable being true.

The second process which crashed was the one with the main coding bug,
however, my question is this, should it be possible to crash the library like
this? In order for the SUB socket to have got content it must have been send
a whole message (even if it was intended to be two independent messages).
The fact that the disconnect, which could have happened for legitimate 
reasons knocked over the library is concerning.

Steve



----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.



More information about the zeromq-dev mailing list