[zeromq-dev] Process hung in recv followup..

Marc Rossi mrossi19 at gmail.com
Fri Nov 5 17:55:16 CET 2010


Quick recap, I have taken a look at what Martin suggested and have more info
below.  Stack trace at end of message.

arch: x86_64
socket type: tcp PUB/SUB
version: 2.0.9

Main thread of subscriber process is blocked in socket_t::recv() while io
thread continues processing data from the socket and pushing it on internal
queue.  This occurs once or twice a day *usually* during traffic spikes
resulting in the large memory usage by the process -- although I haven't
been able to reproduce on demand.   Interestingly terminating the publisher
process seems to unblock the main thread and it then processes all the data
in the internal queue.  Start the publisher back up and things proceed as
expected until the next time the problem crops up.

Martin's suggestion is below.

> What you should have a look at is that zmq_recv is stuck in waiting for
command from an I/O thread, notifying it that a message
> have arrived. (the commands are passed via signaler_t object). The
specific command the writer should send to wake up the
> reader is "revive" (see send_revive call in pipe_t::writer_t and
process_revive in pipe_t::reader_t).

pipe_t::writer_t::send_revive() is never called.  The io_thread does call
zmq::writer_t::flush() -- shown below -- but the zmq::ypipe_t::flush() call
returns true every time and the conditional fails, so no call to
send_revive().

void zmq::writer_t::flush ()
{
    if (!pipe->flush ())
        send_revive (peer);
}

Looking at the zmq::ypipe_t::flush() call shows that it returns true near
the comments that say 'Reader is alive.  Nothing special to do now....'

I realize I need to keep digging but any hints on internals would be
helpful.   From what I can tell reading through some of the comments and
code in ypipe.hpp, there is some scenario where the reader can be asleep
with items in the pipe, so no revive is ever sent.


(gdb) info threads
* 2 Thread 0x7ffff584d710 (LWP 3461)  0x00007ffff711822e in
zmq::epoll_t::loop (this=0x6d7b70) at epoll.cpp:161
  1 Thread 0x7ffff5a5b820 (LWP 3453)  0x0000003a5a80e53c in recv () from
/lib64/libpthread.so.0
(gdb) where
#0  0x0000003a59cdec23 in epoll_wait () from /lib64/libc.so.6
#1  0x00007ffff711822e in zmq::epoll_t::loop (this=0x6d7b70) at
epoll.cpp:161
#2  0x00007ffff7127db7 in zmq::thread_t::thread_routine (arg_=0x6d7bb0) at
thread.cpp:99
#3  0x0000003a5a806a3a in start_thread () from /lib64/libpthread.so.0
#4  0x0000003a59cde62d in clone () from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()
(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff5a5b820 (LWP 3453))]#0
 0x0000003a5a80e53c in recv () from /lib64/libpthread.so.0
(gdb) where
#0  0x0000003a5a80e53c in recv () from /lib64/libpthread.so.0
#1  0x00007ffff7123af6 in zmq::signaler_t::recv (this=0x74a990,
cmd_=0x7fffffffd050, block_=true) at signaler.cpp:274
#2  0x00007ffff7115724 in zmq::app_thread_t::process_commands
(this=0x74a960, block_=<value optimized out>, throttle_=<value optimized
out>) at app_thread.cpp:88
#3  0x00007ffff712403c in zmq::socket_base_t::recv (this=0x6f8330,
msg_=0x7fffffffd7a0, flags_=0) at socket_base.cpp:443
#4  0x0000000000419e9f in zmq::socket_t::recv (this=0x7fffffffdc40,
msg_=0x7fffffffd7a0, flags_=0) at /usr/local/include/zmq.hpp:256
#5  0x0000000000416268 in main (argc=2, argv=0x7fffffffe388) at test.cpp:201
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20101105/b992d12a/attachment.htm>


More information about the zeromq-dev mailing list