[zeromq-dev] Publish / Subscribe vs Multicast

Brian Granger ellisonbg at gmail.com
Thu Feb 11 20:58:25 CET 2010


I too am just getting started with zeormq and our system (see my
recent email to the list) has some overlap with yours.  Ours is even
more complex perhaps.

> I am trying to design a pipelined load distribution system, where I have
> one process (let's call it the distributor) receiving many requests to
> execute the same task for different parameter combinations; this one
> process will pass on those tasks to one of N processes (called the
> workers), which will take care of executing the particular task.
> Optionally, the worker processes will notify a final process (could be
> the same as the initial distributor) that this particular task is
> finished, and wait for more work to do.

> I would like to hear opinions on several design issues:
> 1.	What would be the practical differences between using a PubSub
> approach and using Multicast to pass the requests from distributor to
> workers?
> 2.	By going with PubSub or Multicast, all the workers will receive
> all task requests and will have to decide whether they are the worker
> which should process it. What are practical ways of making this
> decision? It looks like this approach requires the workers to know in
> advance the total number of workers in the pool, right?

My thought is to use use a PUB/SUB model with topics for each worker.
When a worker attaches, it would send a presence or registration
message to the central messag hub.  That hub would assign a topic to
the worker.  From then on, the worker would subscribe to that topic
and the scehduler (application level) would append that topic to send
tasks to the worker.

> 3.	How to handle crashed workers? How about workers that are not
> responding? What if I want to add workers?

Yes, these are exactly the questions I am struggling with.  Why don't
you join in on the other threads going on to continue this discussion.
 I can see how to handle new workers joining in (they send a
registration msg and are allocated a topic), but i am struggling with
how to handle workers going away.  There needs to be some way for the
application level scheduler to discover that a worker is dead and
should not be allocated tasks.

> 4.	Maybe I should have the distributor handle the load
> distribution, not using PubSub or Multicast, but choosing a specific
> worker and sending the task request directly to it. Same questions
> apply, right?

Yes, I am definitely thinking of having application level queues and
scheduling logic.  And the same questions to apply.

One more comment:  the current design of zeromq seems to be focused on
messages as packets of information.  But in the usage cases you and I
are describing, our messages are "actions" more like you see in RPC
systems.  But, the problem with  RPC systems is that they are not
asynchronous or fault tolerant enough.

Very timely post!



> Any 0mq-specific documentation or examples that might help me answer
> these questions? Thanks in advance.

Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu
ellisonbg at gmail.com

More information about the zeromq-dev mailing list