[zeromq-dev] Feedback on new PATCH socket

Martin Sustrik sustrik at 250bpm.com
Sun May 8 07:47:38 CEST 2011

Hi Fabien,

> So, may be my specific need is not the best example to generalize,
> but I'm still thinking that a dispatch and collect pattern can be
> quite useful in a grid-oriented network.

Definitely. I am actually pretty enthusiastic about it as it seems to be 
the third (aside of pub/sub and req/rep) fully scalable pattern I've 
ever seen.

> I understand that.  My usage of XREP sockets is more a pratical
> issue.  Semantically, it is nearer from the REQ socket.  I just
> cannot remove the strict send/recv policy of REQ to allow it to recv
> multiple replies and that's why I use a XREP socket to implement it,
> but I would clearly prefer a dedicate socket with a clear endpoint
> semantic for both.


>> The obvious problem with any one request many replies model is that
>> the requester has no idea whether it have got all the answers or
>> not yet. Specifically, think of large distributed topologies where
>> at least a part of topology is likely to be offline at any given
>> moment.
>> The only solution seems to be to set a deadline for the replies.
>> The user code could then look something like this:
>> s = socket (SURVEYOR); zmq_setsockopt (s, ZMQ_DEADLINE, 10); //
>> create a request... zmq_send (s, request, 0); while (true) {
>> zmq_msg_t reply; rc = zmq_recv (s,&reply, 0); if (rc<  0&&  errno =
>> EDEADLINE) break; //   process reply here... }
> Personnaly, and it is really only a matter of taste, I don't like the
> idea of the socket handling the deadline itself.  If I would like to
> have a lock-steps approach of the problem, I would said that all
> PATCH socket required all connected sockets to send a reply before
> processing a new request.  Since it doesn't know how many sub
> connections are below each socket, the protocol would required to
> send back a signal telling so.
> So, the pseudo-code for the patch socket would be something like:
> on_send(request): for each socket in out_: send(socket, request);
> push(wait_queue, socket); end. while not empty?(wait_queue): socket
> := poll(wait_queue, POLLIN); reply := recv(socket); if
> getsockopt(socket, RCVEND): pop(wait_queue, socket); end. flags :=
> 0; if getsockopt(socket, RCVMORE): flags := SNDMORE; end. if
> empty?(wait_queue): flags := flags | SNDEND; end. send(_in, reply,
> flags); end. end.

This won't work. The peer can be a device rather than an endpoint. In 
such case you should expect to get arbitrary number of responses from a 
single connection.

> 2- It lock the patch socket until all reply came back.  May be it's
> better this way, given that a single request can generate 1000
> replies but it can also completly lockdown the full tree if one
> socket downstream fail to answer.  In this case, setting a maximum
> timeout (in the poll call above)is the only viable solution

Bingo! That's the main problem. A pattern that allows single failed or 
mis-behaved node to block the whole topology cannot be really called 
scalable. That's why the pattern really needs the timeout/deadline to be 
an inherent part of it.


More information about the zeromq-dev mailing list