[zeromq-dev] libzmq crash closing socket with pending messages

AJ Lewis aj.lewis at quantum.com
Thu Nov 14 01:55:25 CET 2013


Check out https://zeromq.jira.com/browse/LIBZMQ-576 for more info.  It looks like a
previous fix for trying to ensure messages in the encoder were sent out before socket
close is causing issues.  Reverting that fix (for libzmq, it's commit f27eb67e) seems to
clear this up.  But we still probably want something to fix what that commit was
attempting to fix (for details on that, see https://zeromq.jira.com/browse/LIBZMQ-497).

AJ

On Wed, Nov 13, 2013 at 11:06:35PM +0000, Bill M wrote:
> AJ Lewis <aj.lewis <at> quantum.com> writes:
> 
> > 
> > I've recently seen the same thing in 3.2.3, but hadn't been able to pinpoint
> > whether the problem was in zmq proper, or in the application using it.  I
> > look forward to the results of this question.
> > 
> > On Wed, Nov 06, 2013 at 09:47:55AM -0800, Andy Tucker wrote:
> > > Hi, I have a program that sends messages on a ZMQ_DEALER socket with with
> > > ZMQ_DONTWAIT. If it gets back EAGAIN (perhaps because the other end is
> > > responding slowly or has gone away) it calls zmq_close to close the socket
> > > and then re-establish the connection (possibly to a new endpoint) with a
> > > new socket. ZMQ_LINGER is set to 0 (this doesn't appear to happen if
> > > ZMQ_LINGER isn't set, but that can cause other issues).
> > > 
> > > I'm occasionally seeing crashes in the libzmq epoll_t thread with either
> > > "pure virtual method called" or a segmentation fault. The stack looks like
> > > (this is with libzmq 3.2.4 but others are similar):
> > > 
> > > #4  0x00007f8928939ca3 in std::terminate() () from
> > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > > #5  0x00007f892893a77f in __cxa_pure_virtual () from
> > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > > #6  0x00007f8929649db1 in zmq::v1_encoder_t::message_ready
> > > (this=0x7f8918000b90) at v1_encoder.cpp:66
> > > #7  0x00007f892964a2a4 in zmq::encoder_base_t<zmq::v1_encoder_t>::get_data
> > > (this=0x7f8918000b90, data_=0x7f8918000928, size_=0x7f8918000930,
> > > offset_=0x0) at encoder.hpp:93
> > > #8  0x00007f892963fb42 in zmq::stream_engine_t::out_event
> > > (this=0x7f89180008e0) at stream_engine.cpp:261
> > > #9  0x00007f8929627d1a in zmq::epoll_t::loop (this=0x8eace0) at
> > > epoll.cpp:158
> > > #10 0x00007f8929644996 in thread_routine (arg_=0x8ead50) at thread.cpp:83
> > > #11 0x00007f8928be6e9a in start_thread (arg=0x7f89271b9700) at
> > > pthread_create.c:308
> > > #12 0x00007f89293453fd in clone () at
> > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> > > 
> > > Looking at the core, it appears that the memory pointed to by the
> > > msg_source field in the encoder has been freed (the "pure virtual method
> > > called" is because the vtbl pointer has been munged by something that
> > > re-allocated the buffer). The msg_source field points to the
> > > session_base_t, but that was freed by the zmq_close. The session_base_t
> > > destructor calls engine->terminate(), which would normally free the engine
> > > state but doesn't do anything if the encoder still has data left to be sent.
> > > 
> > > I've reproduced this with 3.2.4, 4.0.1, and master (as of a few days ago).
> > > I filed LIBZMQ-576 and attached a small test program to the issue.
> > > 
> > > This looks like a libzmq bug to me, though if I'm misusing the API in some
> > > way (or if there's a reasonable workaround) please let me know.
> > > 
> > > Andy
> > 
> > > _______________________________________________
> > > zeromq-dev mailing list
> > > zeromq-dev <at> lists.zeromq.org
> > > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > 
> 
> 
> I'm seeing something similar too, using zmq 3.2.3 through PHP.
> The segfault is killing the apache process with the following stack trace:
> 
> #0  0x00007f4ae573ab65 in raise () from /lib/libc.so.6
> #1  0x00007f4ae573e6b0 in abort () from /lib/libc.so.6
> #2  0x00007f4adbaaa8c5 in __gnu_cxx::__verbose_terminate_handler() () from 
> /usr/lib/libstdc++.so.6
> #3  0x00007f4adbaa8cf6 in ?? () from /usr/lib/libstdc++.so.6
> #4  0x00007f4adbaa8d23 in std::terminate() () from /usr/lib/libstdc++.so.6
> #5  0x00007f4adbaa95ff in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
> #6  0x00007f4ad92267d7 in ?? () from /usr/local/lib/libzmq.so.3
> #7  0x00007f4ad92271af in ?? () from /usr/local/lib/libzmq.so.3
> #8  0x00007f4ad921a0f5 in ?? () from /usr/local/lib/libzmq.so.3
> #9  0x00007f4ad9202702 in ?? () from /usr/local/lib/libzmq.so.3
> #10 0x00007f4ad92207fb in ?? () from /usr/local/lib/libzmq.so.3
> #11 0x00007f4ae5a949ca in start_thread () from /lib/libpthread.so.0
> #12 0x00007f4ae57f11cd in clone () from /lib/libc.so.6
> #13 0x0000000000000000 in ?? ()
> 
> Any word on this?
> 
> Thanks.
> 
> 
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev

-- 
AJ Lewis
Software Engineer
Quantum Corporation

Work:    651 688-4346
email:   aj.lewis at quantum.com

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.



More information about the zeromq-dev mailing list