[zeromq-dev] 0MQ/2.0 Intermittent missing replies in REQ/REP

Martin Sustrik sustrik at 250bpm.com
Sat Nov 14 18:57:52 CET 2009


Hi Ben,

Sorry for delay (I've been traveling). Your fix is merged into the trunk 
now.

Martin

Ben Dyer wrote:
> Martin,
> 
> Absolutely -- I've sent you a GitHub pull request for the change.
> 
> Regards,
> Ben
> 
> On 11/11/2009, at 09:53 , Martin Sustrik wrote:
> 
>> Hi Ben,
>>
>> You are definitely right! What a stupid bug :(
>>
>> Are you OK to submit the fix under MIT license?
>>
>> Thanks.
>> Martin
>>
>> On 10/11/2009, "Ben Dyer" <Ben.Dyer at taguchimail.com> wrote:
>>
>>> Hi,
>>>
>>> In testing a REQ/REP setup with multiple requesters connected to one
>>> server, I've noticed that occasionally the final requester never
>>> receives a reply, even though the server application is sending them
>>> immediately. Aside from that single requester not receiving its  
>>> reply,
>>> everything else continues to function normally and the replier stays
>>> active and handles other requests correctly.
>>>
>>> This only seems to happen under heavy load (involving many concurrent
>>> requests from multiple sources), and using tcpdump I've determined
>>> that the reply isn't actually being sent by the replier (at least not
>>> to the correct requester).
>>>
>>> I haven't been able to create a setup which reproduces the issue
>>> consistently outside of our application -- the problem is also
>>> dependent on system load and possibly other factors.
>>>
>>> However, reviewing src/rep.cpp I noticed the following code in
>>> rep_t::xrecv:
>>>
>>> //  Round-robin over the pipes to get next message.
>>> for (int count = active; count != 0; count--) {
>>>    bool fetched = in_pipes [current]->read (msg_);
>>>    current++;
>>>    if (current >= active)
>>>        current = 0;
>>>    if (fetched) {
>>>        reply_pipe = out_pipes [current];
>>>        waiting_for_reply = true;
>>>        return 0;
>>>    }
>>> }
>>>
>>> This appears to set reply_pipe incorrectly in the event that current
>>>> = active, so if there are multiple active pipes and a request is
>>> received from the last, the reply to that request will be delivered  
>>> to
>>> the out_pipe for the first in_pipe, *not* the out_pipe matching the
>>> in_pipe from which the request was read. I believe this is causing  
>>> the
>>> issue I'm seeing, but have not yet been able to prove it  
>>> conclusively.
>>>
>>> In any event, changing that code to:
>>>
>>> //  Round-robin over the pipes to get next message.
>>> for (int count = active; count != 0; count--) {
>>>    bool fetched = in_pipes [current]->read (msg_);
>>>    if (fetched) {
>>>        reply_pipe = out_pipes [current];
>>>        waiting_for_reply = true;
>>>    }
>>>    current++;
>>>    if (current >= active)
>>>        current = 0;
>>>    if (fetched)
>>>        return 0;
>>> }
>>>
>>> fixes the problem while preserving the round-robin ordering.
>>>
>>> Regards,
>>> Ben
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> 
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev




More information about the zeromq-dev mailing list