[zeromq-dev] 0MQ/2.0 Intermittent missing replies in REQ/REP

Ben Dyer Ben.Dyer at taguchimail.com
Tue Nov 10 05:51:34 CET 2009


Hi,

In testing a REQ/REP setup with multiple requesters connected to one  
server, I've noticed that occasionally the final requester never  
receives a reply, even though the server application is sending them  
immediately. Aside from that single requester not receiving its reply,  
everything else continues to function normally and the replier stays  
active and handles other requests correctly.

This only seems to happen under heavy load (involving many concurrent  
requests from multiple sources), and using tcpdump I've determined  
that the reply isn't actually being sent by the replier (at least not  
to the correct requester).

I haven't been able to create a setup which reproduces the issue  
consistently outside of our application -- the problem is also  
dependent on system load and possibly other factors.

However, reviewing src/rep.cpp I noticed the following code in  
rep_t::xrecv:

//  Round-robin over the pipes to get next message.
for (int count = active; count != 0; count--) {
     bool fetched = in_pipes [current]->read (msg_);
     current++;
     if (current >= active)
         current = 0;
     if (fetched) {
         reply_pipe = out_pipes [current];
         waiting_for_reply = true;
         return 0;
     }
}

This appears to set reply_pipe incorrectly in the event that current  
 >= active, so if there are multiple active pipes and a request is  
received from the last, the reply to that request will be delivered to  
the out_pipe for the first in_pipe, *not* the out_pipe matching the  
in_pipe from which the request was read. I believe this is causing the  
issue I'm seeing, but have not yet been able to prove it conclusively.

In any event, changing that code to:

//  Round-robin over the pipes to get next message.
for (int count = active; count != 0; count--) {
     bool fetched = in_pipes [current]->read (msg_);
     if (fetched) {
         reply_pipe = out_pipes [current];
         waiting_for_reply = true;
     }
     current++;
     if (current >= active)
         current = 0;
     if (fetched)
         return 0;
}

fixes the problem while preserving the round-robin ordering.

Regards,
Ben



More information about the zeromq-dev mailing list