[zeromq-dev] Usage case description and questions

Brian Granger ellisonbg at gmail.com
Mon Feb 15 18:10:36 CET 2010

> I believe the first thing to sort out is the routing model:
>> * Dynamic load balancing.  In this scenario, tasks are sent to workers
>> that are "least" busy.
>> In english, this is "run this task ASAP anywhere you can."
>> * Worker specific scheduling.  By this, I mean we need the ability to
>> track worker processes  by id/name/number and send particular tasks to
>> that particular worker.
>> In english this is "run this task only on this worker, even if it takes
>> longer."
>  Alright, that is a start.  Any thoughts, ideas or feedback would be
>> greatly appreciated.
> Let's have a look at the problem from the point of view of "services". As
> far as I understand what you have is "generic" service that will simply
> load-balance the request among the workers. Additionally you have a set of
> specific services (say service "A") that need to be handled by a specific
> worker.

Yes, this is a good way of thinking about it.  In this context, one of
the questions
I have to be able to answer is "is service A available currently"

> Have a look at the attached diagram. REQ/REP model is used rather than
> PUB/SUB. Would it work that way?

If the clients have to connect directly to the workers, no, it won't
work.  This is because
the workers are often on a private subnet with a firewall blocking
inbound connections.
The workers *must* make outbound connections to a central process
(what you label as the
shared queue) that is outside the private subnet.  Typically I think
of having a central message
hub that manages the shared queue as well as the queues for the worker
specific services.

> (In case you would want several workers handling service A, one more shared
> queue can be added to the diagram.)

That is a perfect example that leads to the question: how does my
application know to create
one of these shared queues? (all this is very dynamic).  Is a service
only has one worker,
I still need to be able to detect if that worker fails.  So the
clients can adjust.

>> As an aside, it looks like if we used zeromq, we could improve our
>> latency by a factor of 10-100.  That would simply be amazing and would
>> enable to scale up our system to very large cluster and
>> supercomputers.
> Yes, that was the idea behind 0MQ. The drawback is that development is much
> more labour intensive and adding new features without hurting performance is
> a lengthy process.

I don't mind the labour intensive coding at the application level.
Our current implementation
is already quite a beast.

> Anyway, I worked on XREP and XREQ socket types over the
> weekend and it seems to be almost ready. These socket types would allow for
> routing replies back to original requester even over multiple network hops
> (see the shared queue component on the diagram).

The docs of XREP/XREQ and quite thin and I didn't see any good
examples.  Can you say a bit more
about how the tagging/postfix works?  This might help.

This helps, but I still have the basic need to be able to detect the
status of the workers and adjust the
application level routing and client logic.



> Martin

Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu
ellisonbg at gmail.com

More information about the zeromq-dev mailing list