[zeromq-dev] zctx_destroy is hanging
Stephen Hemminger
stephen at networkplumber.org
Wed May 22 20:12:59 CEST 2013
On Wed, 22 May 2013 09:44:52 +0200
Pieter Hintjens <ph at imatix.com> wrote:
> Can you provide a minimal reproducible case?
>
> -Pieter
>
>
> On Wed, May 22, 2013 at 12:32 AM, Stephen Hemminger <
> stephen at networkplumber.org> wrote:
>
> > We have a ZMQ based application (in C) using CZMQ and ZMQ 2.2.0
> > When daemon is due to be restarted or shutdown
> > 1. it receives a SIGTERM
> > 2. The signal is caught, and flag is set
> > 3. all the worker threads exit
> > 4. main thread waits for workers and does some other cleanup
> > 5. calls zctx_destroy()
> > and hangs there; any clues? maybe the zctx_destroy() is redundant anyway.
> >
> >
> > int
> > main(int argc, char **argv)
> > {
> > ...
> >
> > zctx_destroy(&zmq_ctx); << hang here
> >
> > return 0;
> > }
> >
> > There were several ZMQ sockets created, instrumenting CZMQ, it looks
> > like ZMQ is hanging in zctx__socket_destroy() of the ZMQ_REQ socket
> > which was bound twice, once to an ipc: endpoint and again to a
> > tcp://lo:5910
> > endpoint.
> >
> > Internally it looks like ZMQ reaper isn't working.
> >
> > The back trace of main thread is:
> > [Switching to thread 1 (Thread 0x7f1267625c80 (LWP 2065))]#0
> > 0x00007f126626ec13 in poll () from /lib/libc.so.6
> > (gdb) where
> > #0 0x00007f126626ec13 in poll () from /lib/libc.so.6
> > #1 0x00007f1266bd5df0 in zmq::signaler_t::wait (this=<value optimized
> > out>,
> > timeout_=-1) at signaler.cpp:145
> > #2 0x00007f1266bc6aae in zmq::mailbox_t::recv (this=0x1b4c808,
> > cmd_=0x7fff010baee0, timeout_=-1) at mailbox.cpp:74
> > #3 0x00007f1266bc059d in zmq::ctx_t::terminate (this=0x1b4c770) at
> > ctx.cpp:146
> > #4 0x00007f1266be100c in zmq_term (ctx_=0x1b4c770) at zmq.cpp:292
> > #5 0x00007f1266df8efe in zctx_destroy (self_p=0x7107a0) at zctx.c:122
> > #6 0x000000000040ae53 in main (argc=<value optimized out>,
> >
> > Some other threads:
> > (gdb) thread 4
> > [Switching to thread 4 (Thread 0x7f1241bf9700 (LWP 2149))]#0
> > 0x00007f126627a163 in epoll_wait () from /lib/libc.so.6
> > (gdb) where
> > #0 0x00007f126627a163 in epoll_wait () from /lib/libc.so.6
> > #1 0x00007f1266bc3a90 in zmq::epoll_t::loop (this=0x1b4e680) at
> > epoll.cpp:142
> > #2 0x00007f1266bdbdeb in thread_routine (arg_=0x1b4e6f0) at thread.cpp:75
> > #3 0x00007f12665128ca in start_thread () from /lib/libpthread.so.0
> > #4 0x00007f1266279b6d in clone () from /lib/libc.so.6
> > #5 0x0000000000000000 in ?? ()
> > (gdb) thread 5
> > [Switching to thread 5 (Thread 0x7f12423fa700 (LWP 2148))]#0
> > 0x00007f126627a163 in epoll_wait () from /lib/libc.so.6
> > (gdb) where
> > #0 0x00007f126627a163 in epoll_wait () from /lib/libc.so.6
> > #1 0x00007f1266bc3a90 in zmq::epoll_t::loop (this=0x1b4d050) at
> > epoll.cpp:142
> > #2 0x00007f1266bdbdeb in thread_routine (arg_=0x1b4d0c0) at thread.cpp:75
> > #3 0x00007f12665128ca in start_thread () from /lib/libpthread.so.0
> > #4 0x00007f1266279b6d in clone () from /lib/libc.so.6
> > #5 0x0000000000000000 in ?? ()
> > (gdb) thread 6
> > [Switching to thread 6 (Thread 0x7f1242bfb700 (LWP 2102))]#0
> > 0x00007f126651a14d in read () from /lib/libpthread.so.0
> > (gdb) where
> > #0 0x00007f126651a14d in read () from /lib/libpthread.so.0
> > #1 0x00000000004c6938 in eal_thread_loop ()
> > #2 0x00007f12665128ca in start_thread () from /lib/libpthread.so.0
> > #3 0x00007f1266279b6d in clone () from /lib/libc.so.6
> > #4 0x0000000000000000 in ?? ()
> > (gdb) thread 7
> > [Switching to thread 7 (Thread 0x7f12433fc700 (LWP 2101))]#0
> > 0x00007f126651a14d in read () from /lib/libpthread.so.0
> > (gdb) where
> > #0 0x00007f126651a14d in read () from /lib/libpthread.so.0
> > #1 0x00000000004c6938 in eal_thread_loop ()
> > #2 0x00007f12665128ca in start_thread () from /lib/libpthread.so.0
> > #3 0x00007f1266279b6d in clone () from /lib/libc.so.6
> > #4 0x0000000000000000 in ?? ()
> > (gdb) thread 8
> > [Switching to thread 8 (Thread 0x7f1243bfd700 (LWP 2100))]#0
> > 0x00007f126651a14d in read () from /lib/libpthread.so.0
> > (gdb) where
> > #0 0x00007f126651a14d in read () from /lib/libpthread.so.0
> > #1 0x00000000004c6938 in eal_thread_loop ()
> > #2 0x00007f12665128ca in start_thread () from /lib/libpthread.so.0
> > #3 0x00007f1266279b6d in clone () from /lib/libc.so.6
> > #4 0x0000000000000000 in ?? ()
> > _______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev at lists.zeromq.org
> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >
Found it, not a zmq problem per say.
Like any other application, our application has grown, and off in a new feature
there is another zthread which was being started as a detached thread but using
the same ctx and not exiting. Having it watch the same exit flag, and giving it
it's own context solved the issue.
More information about the zeromq-dev
mailing list