[zeromq-dev] libzmq crash closing socket with pending messages
Pieter Hintjens
ph at imatix.com
Thu Jan 2 23:26:36 CET 2014
For the backports you can do this now. For the master release I'll do
that all at once for the next release.
On Thu, Jan 2, 2014 at 11:16 PM, AJ Lewis <aj.lewis at quantum.com> wrote:
> On Thu, Jan 02, 2014 at 10:16:40PM +0100, Pieter Hintjens wrote:
>> Thanks for those pull requests. I merged them. You can update the NEWS
>> as you go along (didn't check if you did that), particularly for
>> backports.
>
> Cool - thanks. I didn't adjust the NEWS. Should I make another pull
> request for that, or will it get adjusted by someone else later?
>
> AJ
>
>> On Thu, Jan 2, 2014 at 5:34 PM, AJ Lewis <aj.lewis at quantum.com> wrote:
>> > Just a heads up that I'm going to submit pull requests to libzmq, zeromq3-x,
>> > and zeromq4-x to revert the fix for LIBZMQ-497 in order to fix LIBZMQ-576.
>> > This means some other solution needs to be found for that problem though - I
>> > don't have a clear idea of how to do that, but I do know that crashing on
>> > socket close isn't acceptable behavior.
>> >
>> > Thanks,
>> > AJ
>> >
>> > On Wed, Nov 13, 2013 at 06:55:25PM -0600, AJ Lewis wrote:
>> >> Check out https://zeromq.jira.com/browse/LIBZMQ-576 for more info. It
>> >> looks like a previous fix for trying to ensure messages in the encoder
>> >> were sent out before socket close is causing issues. Reverting that fix
>> >> (for libzmq, it's commit f27eb67e) seems to clear this up. But we still
>> >> probably want something to fix what that commit was attempting to fix (for
>> >> details on that, see https://zeromq.jira.com/browse/LIBZMQ-497).
>> >>
>> >> AJ
>> >>
>> >> On Wed, Nov 13, 2013 at 11:06:35PM +0000, Bill M wrote:
>> >> > AJ Lewis <aj.lewis <at> quantum.com> writes:
>> >> >
>> >> > >
>> >> > > I've recently seen the same thing in 3.2.3, but hadn't been able to pinpoint
>> >> > > whether the problem was in zmq proper, or in the application using it. I
>> >> > > look forward to the results of this question.
>> >> > >
>> >> > > On Wed, Nov 06, 2013 at 09:47:55AM -0800, Andy Tucker wrote:
>> >> > > > Hi, I have a program that sends messages on a ZMQ_DEALER socket with with
>> >> > > > ZMQ_DONTWAIT. If it gets back EAGAIN (perhaps because the other end is
>> >> > > > responding slowly or has gone away) it calls zmq_close to close the socket
>> >> > > > and then re-establish the connection (possibly to a new endpoint) with a
>> >> > > > new socket. ZMQ_LINGER is set to 0 (this doesn't appear to happen if
>> >> > > > ZMQ_LINGER isn't set, but that can cause other issues).
>> >> > > >
>> >> > > > I'm occasionally seeing crashes in the libzmq epoll_t thread with either
>> >> > > > "pure virtual method called" or a segmentation fault. The stack looks like
>> >> > > > (this is with libzmq 3.2.4 but others are similar):
>> >> > > >
>> >> > > > #4 0x00007f8928939ca3 in std::terminate() () from
>> >> > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> >> > > > #5 0x00007f892893a77f in __cxa_pure_virtual () from
>> >> > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> >> > > > #6 0x00007f8929649db1 in zmq::v1_encoder_t::message_ready
>> >> > > > (this=0x7f8918000b90) at v1_encoder.cpp:66
>> >> > > > #7 0x00007f892964a2a4 in zmq::encoder_base_t<zmq::v1_encoder_t>::get_data
>> >> > > > (this=0x7f8918000b90, data_=0x7f8918000928, size_=0x7f8918000930,
>> >> > > > offset_=0x0) at encoder.hpp:93
>> >> > > > #8 0x00007f892963fb42 in zmq::stream_engine_t::out_event
>> >> > > > (this=0x7f89180008e0) at stream_engine.cpp:261
>> >> > > > #9 0x00007f8929627d1a in zmq::epoll_t::loop (this=0x8eace0) at
>> >> > > > epoll.cpp:158
>> >> > > > #10 0x00007f8929644996 in thread_routine (arg_=0x8ead50) at thread.cpp:83
>> >> > > > #11 0x00007f8928be6e9a in start_thread (arg=0x7f89271b9700) at
>> >> > > > pthread_create.c:308
>> >> > > > #12 0x00007f89293453fd in clone () at
>> >> > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> >> > > >
>> >> > > > Looking at the core, it appears that the memory pointed to by the
>> >> > > > msg_source field in the encoder has been freed (the "pure virtual method
>> >> > > > called" is because the vtbl pointer has been munged by something that
>> >> > > > re-allocated the buffer). The msg_source field points to the
>> >> > > > session_base_t, but that was freed by the zmq_close. The session_base_t
>> >> > > > destructor calls engine->terminate(), which would normally free the engine
>> >> > > > state but doesn't do anything if the encoder still has data left to be sent.
>> >> > > >
>> >> > > > I've reproduced this with 3.2.4, 4.0.1, and master (as of a few days ago).
>> >> > > > I filed LIBZMQ-576 and attached a small test program to the issue.
>> >> > > >
>> >> > > > This looks like a libzmq bug to me, though if I'm misusing the API in some
>> >> > > > way (or if there's a reasonable workaround) please let me know.
>> >> > > >
>> >> > > > Andy
>> >> > >
>> >> > > > _______________________________________________
>> >> > > > zeromq-dev mailing list
>> >> > > > zeromq-dev <at> lists.zeromq.org
>> >> > > > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >> > >
>> >> >
>> >> >
>> >> > I'm seeing something similar too, using zmq 3.2.3 through PHP.
>> >> > The segfault is killing the apache process with the following stack trace:
>> >> >
>> >> > #0 0x00007f4ae573ab65 in raise () from /lib/libc.so.6
>> >> > #1 0x00007f4ae573e6b0 in abort () from /lib/libc.so.6
>> >> > #2 0x00007f4adbaaa8c5 in __gnu_cxx::__verbose_terminate_handler() () from
>> >> > /usr/lib/libstdc++.so.6
>> >> > #3 0x00007f4adbaa8cf6 in ?? () from /usr/lib/libstdc++.so.6
>> >> > #4 0x00007f4adbaa8d23 in std::terminate() () from /usr/lib/libstdc++.so.6
>> >> > #5 0x00007f4adbaa95ff in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
>> >> > #6 0x00007f4ad92267d7 in ?? () from /usr/local/lib/libzmq.so.3
>> >> > #7 0x00007f4ad92271af in ?? () from /usr/local/lib/libzmq.so.3
>> >> > #8 0x00007f4ad921a0f5 in ?? () from /usr/local/lib/libzmq.so.3
>> >> > #9 0x00007f4ad9202702 in ?? () from /usr/local/lib/libzmq.so.3
>> >> > #10 0x00007f4ad92207fb in ?? () from /usr/local/lib/libzmq.so.3
>> >> > #11 0x00007f4ae5a949ca in start_thread () from /lib/libpthread.so.0
>> >> > #12 0x00007f4ae57f11cd in clone () from /lib/libc.so.6
>> >> > #13 0x0000000000000000 in ?? ()
>> >> >
>> >> > Any word on this?
>> >> >
>> >> > Thanks.
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > zeromq-dev mailing list
>> >> > zeromq-dev at lists.zeromq.org
>> >> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >>
>> >> --
>> >> AJ Lewis
>> >> Software Engineer
>> >> Quantum Corporation
>> >>
>> >> Work: 651 688-4346
>> >> email: aj.lewis at quantum.com
>> >>
>> >> ----------------------------------------------------------------------
>> >> The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.
>> >> _______________________________________________
>> >> zeromq-dev mailing list
>> >> zeromq-dev at lists.zeromq.org
>> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >
>> > --
>> > AJ Lewis
>> > Software Engineer
>> > Quantum Corporation
>> >
>> > Work: 651 688-4346
>> > email: aj.lewis at quantum.com
>> > _______________________________________________
>> > zeromq-dev mailing list
>> > zeromq-dev at lists.zeromq.org
>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
> --
> AJ Lewis
> Software Engineer
> Quantum Corporation
>
> Work: 651 688-4346
> email: aj.lewis at quantum.com
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
More information about the zeromq-dev
mailing list