[zeromq-dev] libzmq crash closing socket with pending messages

Bill M billm at fractalmonkey.com
Thu Nov 14 00:06:35 CET 2013


AJ Lewis <aj.lewis <at> quantum.com> writes:

> 
> I've recently seen the same thing in 3.2.3, but hadn't been able to pinpoint
> whether the problem was in zmq proper, or in the application using it.  I
> look forward to the results of this question.
> 
> On Wed, Nov 06, 2013 at 09:47:55AM -0800, Andy Tucker wrote:
> > Hi, I have a program that sends messages on a ZMQ_DEALER socket with with
> > ZMQ_DONTWAIT. If it gets back EAGAIN (perhaps because the other end is
> > responding slowly or has gone away) it calls zmq_close to close the socket
> > and then re-establish the connection (possibly to a new endpoint) with a
> > new socket. ZMQ_LINGER is set to 0 (this doesn't appear to happen if
> > ZMQ_LINGER isn't set, but that can cause other issues).
> > 
> > I'm occasionally seeing crashes in the libzmq epoll_t thread with either
> > "pure virtual method called" or a segmentation fault. The stack looks like
> > (this is with libzmq 3.2.4 but others are similar):
> > 
> > #4  0x00007f8928939ca3 in std::terminate() () from
> > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > #5  0x00007f892893a77f in __cxa_pure_virtual () from
> > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > #6  0x00007f8929649db1 in zmq::v1_encoder_t::message_ready
> > (this=0x7f8918000b90) at v1_encoder.cpp:66
> > #7  0x00007f892964a2a4 in zmq::encoder_base_t<zmq::v1_encoder_t>::get_data
> > (this=0x7f8918000b90, data_=0x7f8918000928, size_=0x7f8918000930,
> > offset_=0x0) at encoder.hpp:93
> > #8  0x00007f892963fb42 in zmq::stream_engine_t::out_event
> > (this=0x7f89180008e0) at stream_engine.cpp:261
> > #9  0x00007f8929627d1a in zmq::epoll_t::loop (this=0x8eace0) at
> > epoll.cpp:158
> > #10 0x00007f8929644996 in thread_routine (arg_=0x8ead50) at thread.cpp:83
> > #11 0x00007f8928be6e9a in start_thread (arg=0x7f89271b9700) at
> > pthread_create.c:308
> > #12 0x00007f89293453fd in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> > 
> > Looking at the core, it appears that the memory pointed to by the
> > msg_source field in the encoder has been freed (the "pure virtual method
> > called" is because the vtbl pointer has been munged by something that
> > re-allocated the buffer). The msg_source field points to the
> > session_base_t, but that was freed by the zmq_close. The session_base_t
> > destructor calls engine->terminate(), which would normally free the engine
> > state but doesn't do anything if the encoder still has data left to be sent.
> > 
> > I've reproduced this with 3.2.4, 4.0.1, and master (as of a few days ago).
> > I filed LIBZMQ-576 and attached a small test program to the issue.
> > 
> > This looks like a libzmq bug to me, though if I'm misusing the API in some
> > way (or if there's a reasonable workaround) please let me know.
> > 
> > Andy
> 
> > _______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev <at> lists.zeromq.org
> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> 


I'm seeing something similar too, using zmq 3.2.3 through PHP.
The segfault is killing the apache process with the following stack trace:

#0  0x00007f4ae573ab65 in raise () from /lib/libc.so.6
#1  0x00007f4ae573e6b0 in abort () from /lib/libc.so.6
#2  0x00007f4adbaaa8c5 in __gnu_cxx::__verbose_terminate_handler() () from 
/usr/lib/libstdc++.so.6
#3  0x00007f4adbaa8cf6 in ?? () from /usr/lib/libstdc++.so.6
#4  0x00007f4adbaa8d23 in std::terminate() () from /usr/lib/libstdc++.so.6
#5  0x00007f4adbaa95ff in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
#6  0x00007f4ad92267d7 in ?? () from /usr/local/lib/libzmq.so.3
#7  0x00007f4ad92271af in ?? () from /usr/local/lib/libzmq.so.3
#8  0x00007f4ad921a0f5 in ?? () from /usr/local/lib/libzmq.so.3
#9  0x00007f4ad9202702 in ?? () from /usr/local/lib/libzmq.so.3
#10 0x00007f4ad92207fb in ?? () from /usr/local/lib/libzmq.so.3
#11 0x00007f4ae5a949ca in start_thread () from /lib/libpthread.so.0
#12 0x00007f4ae57f11cd in clone () from /lib/libc.so.6
#13 0x0000000000000000 in ?? ()

Any word on this?

Thanks.





More information about the zeromq-dev mailing list