[zeromq-dev] General understanding of ZMQ and architecture advices

John Jefferies j.jefferies at ntlworld.com
Mon Dec 17 01:12:10 CET 2012


On 16/12/2012 08:35, Pieter Hintjens wrote:
> Hi,
>
> My advice is to not just read the Guide but to work through the
> examples until you've actually understood what is going on, at which
> point you can probably answer your own questions.
>
> -Pieter

Pieter, I don't question this advise. Nevertheless, at the end of the 
ventilator pattern discussion it says "If you are using PUSH and PULL, 
and one of your workers gets way more messages than the others, it's 
because that PULL socket has joined faster than the others, and grabs a 
lot of messages before the others manage to connect" which leaves the 
reader with a hanging question that isn't answered until chapter 3. 
Perhaps it should be augmented with a sentence that says something like 
"if you want proper load balancing, you probably want to look at the The 
Load-balancing Pattern example in Chapter 3"?

John


> On Sat, Dec 15, 2012 at 9:18 PM,  <dev at innercircleproject.com> wrote:
>> Hello.
>>
>> We are building data mapping tool that gather several sources of
>> information and try to fit them in a common document model for further
>> consumption.
>>
>> Stage one was to design the "mapping machinery" and it was in our mind
>> from the start to distribute the workload among processes and machines
>> so the design was to have self contained data units that we can process
>> independently, with the only thorny issues being of course the
>> distribution at the start (with the relevant retry if fail and TTL
>> problems) and the gathering at the end.
>>
>> We are working in Python and so planning to use pyZMQ.
>>
>> My plan is to stick as close as possible to proved design because I
>> fully understand that it is really easy to "fuck up in mysterious ways"
>> in the real of distributed and asynchronous processing and the team lack
>> experience in that domain, but one need to start one day right ? :)
>>
>> Reading the docs and reading them again I was at first attracted to the
>> "ventilator design" but several questions came to my mind :
>>
>>    The ventilator design seems to be a big "spread it as long as you have
>> something to spread" meaning that if the workers have a processing time
>> superior to the "dividing time" from the ventilator (a probably frequent
>> case and definitely the case in our situation), the ventilator will
>> quickly divide the work between the "n" workers and fill their queues,
>> possibly till overflow (we can have burst of 270 000 jobs-unit to
>> process when dealing with some inbound flux).
>> And of course even if the worker queues (they are on the worker socket
>> side right ? ) can withhold the pressure, it means a worker failure will
>> send to oblivion potentially thousands of jobs that will have to be
>> flagged as such and spread again.
>>
>> My first reaction was to only spread when the sink receive something,
>> thus insuring that no overflow can occurs, but that means the ventilator
>> must know how many workers are connected and the ventilator and the sink
>> must communicate about that somehow, doable but complicated design.
>>
>> My second idea was to reverse the design and have the worker request a
>> job from the ventilator, but that means the "load balancing"
>> capabilities of ZMQ become useless and that the "mutated ventilator"
>> (more a dispatcher now) needs to manage by itself has many two way
>> communications as there is workers connected. Doable again, not really
>> anything to do with the ventilator design anymore, but we can rely on
>> the Queue Device of pyZMQ...
>>
>>
>>
>> My first question : is my "analysis" of the ventilator design right and
>> am I right to assume this is a simple teaching design that is no really
>> practical in "real life" when worker processing time is significant, or
>> do I misunderstand something ?
>>
>> Second question : from my two "ideas", witch one a more seasoned ZMQ
>> user than me (anybody nearly ;) ) would recommend to achieve a paced
>> dispatching of the "jobs" to the workers ?
>>
>>
>> Thanks a lot for your advices.
>>
>> .X.
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev




More information about the zeromq-dev mailing list