[zeromq-dev] 0MQ/2.0 Intermittent missing replies in REQ/REP

Martin Sustrik sustrik at 250bpm.com
Tue Nov 10 23:53:38 CET 2009


Hi Ben,

You are definitely right! What a stupid bug :(

Are you OK to submit the fix under MIT license?

Thanks.
Martin

On 10/11/2009, "Ben Dyer" <Ben.Dyer at taguchimail.com> wrote:

>Hi,
>
>In testing a REQ/REP setup with multiple requesters connected to one
>server, I've noticed that occasionally the final requester never
>receives a reply, even though the server application is sending them
>immediately. Aside from that single requester not receiving its reply,
>everything else continues to function normally and the replier stays
>active and handles other requests correctly.
>
>This only seems to happen under heavy load (involving many concurrent
>requests from multiple sources), and using tcpdump I've determined
>that the reply isn't actually being sent by the replier (at least not
>to the correct requester).
>
>I haven't been able to create a setup which reproduces the issue
>consistently outside of our application -- the problem is also
>dependent on system load and possibly other factors.
>
>However, reviewing src/rep.cpp I noticed the following code in
>rep_t::xrecv:
>
>//  Round-robin over the pipes to get next message.
>for (int count = active; count != 0; count--) {
>     bool fetched = in_pipes [current]->read (msg_);
>     current++;
>     if (current >= active)
>         current = 0;
>     if (fetched) {
>         reply_pipe = out_pipes [current];
>         waiting_for_reply = true;
>         return 0;
>     }
>}
>
>This appears to set reply_pipe incorrectly in the event that current
> >= active, so if there are multiple active pipes and a request is
>received from the last, the reply to that request will be delivered to
>the out_pipe for the first in_pipe, *not* the out_pipe matching the
>in_pipe from which the request was read. I believe this is causing the
>issue I'm seeing, but have not yet been able to prove it conclusively.
>
>In any event, changing that code to:
>
>//  Round-robin over the pipes to get next message.
>for (int count = active; count != 0; count--) {
>     bool fetched = in_pipes [current]->read (msg_);
>     if (fetched) {
>         reply_pipe = out_pipes [current];
>         waiting_for_reply = true;
>     }
>     current++;
>     if (current >= active)
>         current = 0;
>     if (fetched)
>         return 0;
>}
>
>fixes the problem while preserving the round-robin ordering.
>
>Regards,
>Ben
>_______________________________________________
>zeromq-dev mailing list
>zeromq-dev at lists.zeromq.org
>http://lists.zeromq.org/mailman/listinfo/zeromq-dev



More information about the zeromq-dev mailing list