[zeromq-dev] [PATCH] socket_base_t::recv() hangs intermittently when in blocking mode under certain conditions
Marc Rossi
mrossi19 at gmail.com
Tue Nov 9 21:15:02 CET 2010
Main thread calls recv() and hangs forever (after working fine for a period
of time), memory usage grows continuously while io thread pulls data from
socket and pushes on the internal queue. netstat -a shows no data in recv-q
because io thread continues to work properly and pull data from the socket.
This occurs under the following scenario:
User code calls socket_base_t::recv() indirectly through higher level zeromq
API call when there are no messages waiting. Previous 99 (inbound_poll_rate
- 1) calls to the recv() function returned an already waiting message
fetched by the xrecv() call at the start of the function().
This 100th call to recv() is as stated above has no messages waiting to be
read so the xrecv() call fails and rc = -1. Immediately after this call to
xrecv() but BEFORE the conditional statement "if (++ticks ==
inbound_poll_rate)" a message arrives and is processed by the io thread,
resulting in the generation of a revive signal as the new message is pushed
onto the queue. Since ++ticks is now 100 (inbound_poll_rate) the above
conditional is true and app_thread_t::process_commands() is called,
processing the revive signal.
Since this is a BLOCKING socket and rc != 0 we fall down to the loop at the
end of the recv() function that unfortunately for us calls the
app_thread_t::process_commands() method with block_ = true before calling
xrecv(). Since we already read the revive signal above we are now officially
hung as there is still a message in the queue and there will be no more
revive signals generated by the io thread because of that.
To test that this is indeed what is happening I did the following. Added an
integer reference as a third parameter to the
app_thread_t::process_commands() method that is set to the number of
commands received and processed. Immediately before AND after calling
process_commands() method in the final loop of socket_base_t::recv() I added
a deug print statement that is executed ONLY if the prior call to
process_commands() returned a value > 0 for the third param. After running
the test code for about an hour the scenario described above occurred with
the debug print prior to the process_commands() call being displayed and
then the process was hung.
Below is the simple patch that seems to fix the problem for me. This will
incur a small penalty when ticks == 0 and there are no messages waiting to
be read as the initial call to process_commands will return immediately due
to block being set to false. This could be made more efficient if the
process_commands() method took a 3rd param as a bool that was set to true if
commands were actually processed, then we would ONLY set block = false when
the previous call to process_commands() actually did something, not rely on
the ticks = 0 line in the if/then block.
>From 8d45a82d9cf7b788a3bed5014420962ea4ca5969 Mon Sep 17 00:00:00 2001
From: Marc Rossi <mrossi19 at gmail.com>
Date: Tue, 9 Nov 2010 13:46:06 -0600
Subject: [PATCH] Fix socket_t::recv() hang scenario where initial call to
process_commands() eats signal
Added block boolean var to second process_commands() invocation for blocking
sockets
instead of always using true. This prevents the process_commands() call
from hanging
when a message is received with an empty queue after the call to xrecv() but
prior to the initial call to process_commands() invoked when ++ticks ==
inbound_poll_rate.
Signed-off-by: Marc Rossi <mrossi19 at gmail.com>
---
src/socket_base.cpp | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/src/socket_base.cpp b/src/socket_base.cpp
index c933954..344b552 100644
--- a/src/socket_base.cpp
+++ b/src/socket_base.cpp
@@ -437,15 +437,17 @@ int zmq::socket_base_t::recv (::zmq_msg_t *msg_, int
flags_)
// In blocking scenario, commands are processed over and over again
until
// we are able to fetch a message.
+ bool block = (ticks != 0);
while (rc != 0) {
if (errno != EAGAIN)
return -1;
- if (unlikely (!app_thread->process_commands (true, false))) {
+ if (unlikely (!app_thread->process_commands (block, false))) {
errno = ETERM;
return -1;
}
rc = xrecv (msg_, flags_);
ticks = 0;
+ block = true;
}
rcvmore = msg_->flags & ZMQ_MSG_MORE;
--
1.7.2.3
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20101109/db1e86e0/attachment.htm>
More information about the zeromq-dev
mailing list