[zeromq-dev] Feedback on new PATCH socket

Fabien Ninoles fabien.ninoles at ubisoft.com
Sun May 8 04:54:09 CEST 2011

----- Message d'origine -----
> On 05/06/2011 10:15 PM, Fabien Ninoles wrote:
> > Census is just one example.   It's more a "chain of command" pattern
> > where one or multiple GOVERNOR can send a command, to one or multiple
> > WORKERS that obeys and send the results back.   In fact, my primary
> > example were a parallel pipeline where everybody work on the same
> > task.
> Is that meant to implement the hot/hot failover?

Sorry, I think I mix things up a little.  We are currently in the process of replacing a cluster P2P network stack over UDP with zmq.  The old stack has peer discovering, data replication, broadcasting and direct communication between peers has well has fault detection and master migration.  It does a good job so far but fail on two levels, scalability and network split, and required a separate protocol for inter-cluster communication.

I'm redesigning currently the whole thing but have to made some concession in my design just to allow our deployment tools to work in both case.  So, for example, all nodes need to be able to send a command and receive a reply from each others, even if the node doesn't know at start how many other nodes are up and running.

So, may be my specific need is not the best example to generalize, but I'm still thinking that a dispatch and collect pattern can be quite useful in a grid-oriented network.

> This is part of the XREP vs. ROUTER confusion. XREP cannot add
> delimiter 
> because it's meant to reside in the middle of the topology, forwarding 
> request and replies to the next hop.

I understand that.  My usage of XREP sockets is more a pratical issue.  Semantically, it is nearer from the REQ socket.  I just cannot remove the strict send/recv policy of REQ to allow it to recv multiple replies and that's why I use a XREP socket to implement it, but I would clearly prefer a dedicate socket with a clear endpoint semantic for both.

> The obvious problem with any one request many replies model is that the 
> requester has no idea whether it have got all the answers or not yet. 
> Specifically, think of large distributed topologies where at least a 
> part of topology is likely to be offline at any given moment.
> The only solution seems to be to set a deadline for the replies. The 
> user code could then look something like this:
> s = socket (SURVEYOR);
> zmq_setsockopt (s, ZMQ_DEADLINE, 10);
> //   create a request...
> zmq_send (s, request, 0);
> while (true) {
>           zmq_msg_t reply;
>           rc = zmq_recv (s, &reply, 0);
>           if (rc < 0 && errno = EDEADLINE)
>                   break;
>           //   process reply here...
> }

Personnaly, and it is really only a matter of taste, I don't like the idea of the socket handling the deadline itself.  If I would like to have a lock-steps approach of the problem, I would said that all PATCH socket required all connected sockets to send a reply before processing a new request.  Since it doesn't know how many sub connections are below each socket, the protocol would required to send back a signal telling so.

So, the pseudo-code for the patch socket would be something like:

  for each socket in out_:
    send(socket, request);
    push(wait_queue, socket);
  while not empty?(wait_queue):
    socket := poll(wait_queue, POLLIN);
    reply := recv(socket);
    if getsockopt(socket, RCVEND):
      pop(wait_queue, socket);
    flags := 0;
    if getsockopt(socket, RCVMORE):
      flags := SNDMORE;
    if empty?(wait_queue):
      flags := flags | SNDEND;
    send(_in, reply, flags);

The policy for a P-REP socket would be to always a SNDEND flag on the last frame.  For the P-REQ, it would be to read all messages until the SNDEND flag is received.

The first problems I see with this approach:
1- It doesn't handle the "no out_ socket".  That can be fix by sending a special "END_REPLY" that would be dropped by the next PATCH or P-REQ socket upstream (which also mean that it must contains the full address stack of the reply and be distinguishable from any other replies).

2- It lock the patch socket until all reply came back.  May be it's better this way, given that a single request can generate 1000 replies but it can also completly lockdown the full tree if one socket downstream fail to answer.  In this case, setting a maximum timeout (in the poll call above)is the only viable solution (or could we handle it with a HWM only, dropping the current wait_queue before dropping any message).

Hope this help a little bit,


More information about the zeromq-dev mailing list