[zeromq-dev] 0MQ/2.0 Intermittent missing replies in REQ/REP
Martin Sustrik
sustrik at 250bpm.com
Sat Nov 14 18:57:52 CET 2009
Hi Ben,
Sorry for delay (I've been traveling). Your fix is merged into the trunk
now.
Martin
Ben Dyer wrote:
> Martin,
>
> Absolutely -- I've sent you a GitHub pull request for the change.
>
> Regards,
> Ben
>
> On 11/11/2009, at 09:53 , Martin Sustrik wrote:
>
>> Hi Ben,
>>
>> You are definitely right! What a stupid bug :(
>>
>> Are you OK to submit the fix under MIT license?
>>
>> Thanks.
>> Martin
>>
>> On 10/11/2009, "Ben Dyer" <Ben.Dyer at taguchimail.com> wrote:
>>
>>> Hi,
>>>
>>> In testing a REQ/REP setup with multiple requesters connected to one
>>> server, I've noticed that occasionally the final requester never
>>> receives a reply, even though the server application is sending them
>>> immediately. Aside from that single requester not receiving its
>>> reply,
>>> everything else continues to function normally and the replier stays
>>> active and handles other requests correctly.
>>>
>>> This only seems to happen under heavy load (involving many concurrent
>>> requests from multiple sources), and using tcpdump I've determined
>>> that the reply isn't actually being sent by the replier (at least not
>>> to the correct requester).
>>>
>>> I haven't been able to create a setup which reproduces the issue
>>> consistently outside of our application -- the problem is also
>>> dependent on system load and possibly other factors.
>>>
>>> However, reviewing src/rep.cpp I noticed the following code in
>>> rep_t::xrecv:
>>>
>>> // Round-robin over the pipes to get next message.
>>> for (int count = active; count != 0; count--) {
>>> bool fetched = in_pipes [current]->read (msg_);
>>> current++;
>>> if (current >= active)
>>> current = 0;
>>> if (fetched) {
>>> reply_pipe = out_pipes [current];
>>> waiting_for_reply = true;
>>> return 0;
>>> }
>>> }
>>>
>>> This appears to set reply_pipe incorrectly in the event that current
>>>> = active, so if there are multiple active pipes and a request is
>>> received from the last, the reply to that request will be delivered
>>> to
>>> the out_pipe for the first in_pipe, *not* the out_pipe matching the
>>> in_pipe from which the request was read. I believe this is causing
>>> the
>>> issue I'm seeing, but have not yet been able to prove it
>>> conclusively.
>>>
>>> In any event, changing that code to:
>>>
>>> // Round-robin over the pipes to get next message.
>>> for (int count = active; count != 0; count--) {
>>> bool fetched = in_pipes [current]->read (msg_);
>>> if (fetched) {
>>> reply_pipe = out_pipes [current];
>>> waiting_for_reply = true;
>>> }
>>> current++;
>>> if (current >= active)
>>> current = 0;
>>> if (fetched)
>>> return 0;
>>> }
>>>
>>> fixes the problem while preserving the round-robin ordering.
>>>
>>> Regards,
>>> Ben
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
More information about the zeromq-dev
mailing list