[zeromq-dev] ZeroMQ how to troubleshoot the REQ-REP hang
Michael Cuggy
mcuggy at gmail.com
Mon Mar 9 08:03:32 CET 2015
lsof is a list of open files. Having open files would certainly add
to the utilization of processor and/or memory. I would look to see if
the processor or memory is constrained. If 100% of the processor or
memory are being utilized, then new network connections may not be
able to be established. The result would be consistent with the
"hanging" behavior you observe. The resources cannot establish
additional network connections in this scenario.
In theory they could accumulate to a critical amount after 15-20
minutes. It seems like you have almost solved the problem.
On 3/8/15, Jithendra Reddy <jithendra.reddy at gmail.com> wrote:
> Hi,
>
> We have implemented a REQ-REP socket communication. In brief the
> application does the following:
> 1. Client asks for a free tcp port through REQ socket
> 2. Server listens at REP socket, looks for a free tcp port. Forks a child
> process and runs ZeroMQ REP socket listening at the free port. Parent
> process sends back this free port detail to client
> 3. Client then starts communicating to child process at the recieved port
> using REQ-REP ZeroMQ socket
>
> The above application has issues, if we do stress test. We are running
> nearly 30 clients in one minute. Stress test works fine for a while (15-20
> minutes) and then hangs.
>
> We see that message is sent from REQ socket, but not recieved at REP
> socket.
> How to trouble shoot this issue?
>
> We see that lsof is increasing as the stress test progresses. We do close
> sockets in the application and also set linger to 0. Could the increased
> lsof cause hang?
>
> Your inputs to resolved the hang and to troubleshoot the issue will be
> helpful.
>
> Regards
>
More information about the zeromq-dev
mailing list