[zeromq-dev] 0MQ/2.0 Intermittent missing replies in REQ/REP
Ben Dyer
Ben.Dyer at taguchimail.com
Tue Nov 10 05:51:34 CET 2009
Hi,
In testing a REQ/REP setup with multiple requesters connected to one
server, I've noticed that occasionally the final requester never
receives a reply, even though the server application is sending them
immediately. Aside from that single requester not receiving its reply,
everything else continues to function normally and the replier stays
active and handles other requests correctly.
This only seems to happen under heavy load (involving many concurrent
requests from multiple sources), and using tcpdump I've determined
that the reply isn't actually being sent by the replier (at least not
to the correct requester).
I haven't been able to create a setup which reproduces the issue
consistently outside of our application -- the problem is also
dependent on system load and possibly other factors.
However, reviewing src/rep.cpp I noticed the following code in
rep_t::xrecv:
// Round-robin over the pipes to get next message.
for (int count = active; count != 0; count--) {
bool fetched = in_pipes [current]->read (msg_);
current++;
if (current >= active)
current = 0;
if (fetched) {
reply_pipe = out_pipes [current];
waiting_for_reply = true;
return 0;
}
}
This appears to set reply_pipe incorrectly in the event that current
>= active, so if there are multiple active pipes and a request is
received from the last, the reply to that request will be delivered to
the out_pipe for the first in_pipe, *not* the out_pipe matching the
in_pipe from which the request was read. I believe this is causing the
issue I'm seeing, but have not yet been able to prove it conclusively.
In any event, changing that code to:
// Round-robin over the pipes to get next message.
for (int count = active; count != 0; count--) {
bool fetched = in_pipes [current]->read (msg_);
if (fetched) {
reply_pipe = out_pipes [current];
waiting_for_reply = true;
}
current++;
if (current >= active)
current = 0;
if (fetched)
return 0;
}
fixes the problem while preserving the round-robin ordering.
Regards,
Ben
More information about the zeromq-dev
mailing list