[zeromq-dev] General understanding of ZMQ and architecture advices

dev at innercircleproject.com dev at innercircleproject.com
Sat Dec 15 21:18:55 CET 2012


Hello.

We are building data mapping tool that gather several sources of 
information and try to fit them in a common document model for further 
consumption.

Stage one was to design the "mapping machinery" and it was in our mind 
from the start to distribute the workload among processes and machines 
so the design was to have self contained data units that we can process 
independently, with the only thorny issues being of course the 
distribution at the start (with the relevant retry if fail and TTL 
problems) and the gathering at the end.

We are working in Python and so planning to use pyZMQ.

My plan is to stick as close as possible to proved design because I 
fully understand that it is really easy to "fuck up in mysterious ways" 
in the real of distributed and asynchronous processing and the team lack 
experience in that domain, but one need to start one day right ? :)

Reading the docs and reading them again I was at first attracted to the 
"ventilator design" but several questions came to my mind :

  The ventilator design seems to be a big "spread it as long as you have 
something to spread" meaning that if the workers have a processing time 
superior to the "dividing time" from the ventilator (a probably frequent 
case and definitely the case in our situation), the ventilator will 
quickly divide the work between the "n" workers and fill their queues, 
possibly till overflow (we can have burst of 270 000 jobs-unit to 
process when dealing with some inbound flux).
And of course even if the worker queues (they are on the worker socket 
side right ? ) can withhold the pressure, it means a worker failure will 
send to oblivion potentially thousands of jobs that will have to be 
flagged as such and spread again.

My first reaction was to only spread when the sink receive something, 
thus insuring that no overflow can occurs, but that means the ventilator 
must know how many workers are connected and the ventilator and the sink 
must communicate about that somehow, doable but complicated design.

My second idea was to reverse the design and have the worker request a 
job from the ventilator, but that means the "load balancing" 
capabilities of ZMQ become useless and that the "mutated ventilator" 
(more a dispatcher now) needs to manage by itself has many two way 
communications as there is workers connected. Doable again, not really 
anything to do with the ventilator design anymore, but we can rely on 
the Queue Device of pyZMQ...



My first question : is my "analysis" of the ventilator design right and 
am I right to assume this is a simple teaching design that is no really 
practical in "real life" when worker processing time is significant, or 
do I misunderstand something ?

Second question : from my two "ideas", witch one a more seasoned ZMQ 
user than me (anybody nearly ;) ) would recommend to achieve a paced 
dispatching of the "jobs" to the workers ?


Thanks a lot for your advices.

.X.




More information about the zeromq-dev mailing list