[zeromq-dev] Publish / Subscribe vs Multicast

Steven McCoy steven.mccoy at miru.hk
Thu Feb 11 19:31:09 CET 2010


On 11 February 2010 12:43, gonzalo diethelm <gdiethelm at dcv.cl> wrote:

>  I am trying to design a pipelined load distribution system, where I have
> one process (let’s call it the distributor) receiving many requests to
> execute the same task for different parameter combinations; this one process
> will pass on those tasks to one of N processes (called the workers), which
> will take care of executing the particular task. Optionally, the worker
> processes will notify a final process (could be the same as the initial
> distributor) that this particular task is finished, and wait for more work
> to do.
>
>
Basically this is a TIBCO Rendezvous Distributed Queue, a quick overview:

The RVDQ appears as one endpoint to communication and requires no special
action on the client.  The client can communicate via reliable or confirmed
delivery to the queue.

Each member of the RVDQ partakes in a fault tolerant group, one member gets
elected as the scheduler.  It is the schedulers job to assign tasks to
workers.

Communication between the scheduler and workers is always via confirmed
delivery.

The developer can specify the weight of each worker, how many each process
can provide, and a completion time limit for each job.  There is an
outstanding defect that the elected scheduler can only provide one worker to
the group.

>
>
> I would like to hear opinions on several design issues:
>
>    1. What would be the practical differences between using a PubSub
>    approach and using Multicast to pass the requests from distributor to
>    workers?
>
> Generally the bottom line is administration overhead of managing all the
interconnects and how easy the system is to setup for the developer.


>
>    1. By going with PubSub or Multicast, all the workers will receive all
>    task requests and will have to decide whether they are the worker which
>    should process it. What are practical ways of making this decision? It looks
>    like this approach requires the workers to know in advance the total number
>    of workers in the pool, right?
>
> A scheduler is the simplest method and is already popular in different
areas, e.g. HTTP load balancing.  Otherwise you might up re-inventing the
token ring network, however I think there's plenty of computer science
papers on creating a quorum within a group.


>
>    1. How to handle crashed workers? How about workers that are not
>    responding? What if I want to add workers?
>
> Crashes in RVDQ are handled by multiple levels due to communications being
directed through a broker.  Basic application crashes can be caught by
the broker and the broker can notify the network by removing itself from the
fault tolerant group.  Bigger crashes and network failures are handled by a
timeout on the fault tolerant group.  More complicated failures can be
covered by an optional timeout the developer can set which is applied to the
individual confirmed message from the scheduler to the worker.

If the complete queue fails reliability or confirmed delivery it is still
managed by the client sending to the queue - no need to re-implement the
functionality in the scheduler.


>
>    1. Maybe I should have the distributor handle the load distribution,
>    not using PubSub or Multicast, but choosing a specific worker and sending
>    the task request directly to it. Same questions apply, right?
>
> As above, noting how to implement failure of the distributor or scheduler.

-- 
Steve-o

>
>    1.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20100211/a196a0a4/attachment.htm>


More information about the zeromq-dev mailing list