[zeromq-dev] [PATCH] socket_base_t::recv() hangs intermittently when in blocking mode under certain conditions
mrossi19 at gmail.com
Tue Nov 9 21:15:02 CET 2010
Main thread calls recv() and hangs forever (after working fine for a period
of time), memory usage grows continuously while io thread pulls data from
socket and pushes on the internal queue. netstat -a shows no data in recv-q
because io thread continues to work properly and pull data from the socket.
This occurs under the following scenario:
User code calls socket_base_t::recv() indirectly through higher level zeromq
API call when there are no messages waiting. Previous 99 (inbound_poll_rate
- 1) calls to the recv() function returned an already waiting message
fetched by the xrecv() call at the start of the function().
This 100th call to recv() is as stated above has no messages waiting to be
read so the xrecv() call fails and rc = -1. Immediately after this call to
xrecv() but BEFORE the conditional statement "if (++ticks ==
inbound_poll_rate)" a message arrives and is processed by the io thread,
resulting in the generation of a revive signal as the new message is pushed
onto the queue. Since ++ticks is now 100 (inbound_poll_rate) the above
conditional is true and app_thread_t::process_commands() is called,
processing the revive signal.
Since this is a BLOCKING socket and rc != 0 we fall down to the loop at the
end of the recv() function that unfortunately for us calls the
app_thread_t::process_commands() method with block_ = true before calling
xrecv(). Since we already read the revive signal above we are now officially
hung as there is still a message in the queue and there will be no more
revive signals generated by the io thread because of that.
To test that this is indeed what is happening I did the following. Added an
integer reference as a third parameter to the
app_thread_t::process_commands() method that is set to the number of
commands received and processed. Immediately before AND after calling
process_commands() method in the final loop of socket_base_t::recv() I added
a deug print statement that is executed ONLY if the prior call to
process_commands() returned a value > 0 for the third param. After running
the test code for about an hour the scenario described above occurred with
the debug print prior to the process_commands() call being displayed and
then the process was hung.
Below is the simple patch that seems to fix the problem for me. This will
incur a small penalty when ticks == 0 and there are no messages waiting to
be read as the initial call to process_commands will return immediately due
to block being set to false. This could be made more efficient if the
process_commands() method took a 3rd param as a bool that was set to true if
commands were actually processed, then we would ONLY set block = false when
the previous call to process_commands() actually did something, not rely on
the ticks = 0 line in the if/then block.
More information about the zeromq-dev