[zeromq-dev] Publish / Subscribe vs Multicast

Martin Sustrik sustrik at 250bpm.com
Thu Feb 11 19:23:52 CET 2010


Hi Gonzalo,

This is a classic example of multi-hop request/reply scenario. 
Supporting it is on the roadmap, part of the functionality is already 
implemented, resources for implementing the rest are still missing :(

See comments inlined.

> I am trying to design a pipelined load distribution system, where I have 
> one process (let’s call it the distributor) receiving many requests to 
> execute the same task for different parameter combinations; this one 
> process will pass on those tasks to one of N processes (called the 
> workers), which will take care of executing the particular task. 
> Optionally, the worker processes will notify a final process (could be 
> the same as the initial distributor) that this particular task is 
> finished, and wait for more work to do.
> 
> I would like to hear opinions on several design issues:
> 
>    1. What would be the practical differences between using a PubSub
>       approach and using Multicast to pass the requests from distributor
>       to workers?

In this scenario each message is passed to a single worker so using 
multicast would be an overkill.

>    2. By going with PubSub or Multicast, all the workers will receive
>       all task requests and will have to decide whether they are the
>       worker which should process it. What are practical ways of making
>       this decision? It looks like this approach requires the workers to
>       know in advance the total number of workers in the pool, right?

As noted above, there's little point in distributing the request to all 
the workers (unless you are aiming for hot-hot failover) thus TCP 
transport should be used.

>    3. How to handle crashed workers? How about workers that are not
>       responding? What if I want to add workers?

The only 100% reliable algorithm is end-to-end reliability, meaning that 
sending application tags request with an unique tag and waits for a 
reply with the same tag. In the meanwhile it drops all non-matching 
replies. If the reply is not delivered within specified time, the 
request is resent.

>    4. Maybe I should have the distributor handle the load distribution,
>       not using PubSub or Multicast, but choosing a specific worker and
>       sending the task request directly to it. Same questions apply, right?

This scenario can be implemented even now. However, requester would have 
to have addresses of all the workers so that it is able to connect to 
them. Probably not what you want.

In case you would like to give a hand with the implementation, let us know.

Martin



More information about the zeromq-dev mailing list