[zeromq-dev] help with feng shui
Matt Weinstein
mattweinstein at gmail.com
Fri Aug 27 16:17:41 CEST 2010
A similar pub sub socket at the bottom of the workers. You mirror the topology in synchronization space.
On Aug 27, 2010, at 10:01 AM, Andrew Hume <andrew at research.att.com> wrote:
> thanks! that was just the input i was after.
>
> my intent is to do out-of-band signalling,
> but because 0MQ doesn't provide clean startup/termination semantics,
> and because of teh uncertainty caused by buffering, i had to simulate
> one step of teh signalling by sending NO-OPs.
>
> if i don't use NO-OPs, and purely use OOB signalling,
> how do i know when a worker is done with its work?
> how do i know when a ventilator's work messages have all been delivered?
> and, if possible, the answer shouldn't contain any time-related waits.
>
> On Aug 27, 2010, at 9:01 AM, Matt Weinstein wrote:
>
>> IMO
>>
>> You're trying to get state control messages to flow through the system, this method is a hybrid "in band" and "out of band" system.
>>
>> You probably should choose one or the other.
>>
>> OOB - You mirror the topology with a group of PUB/SUB sockets, top to bottom
>> IB - you put an input at the top of the ventilators and send inband messages downstream. In this case it might be useful to have signaling points (devices) that let local components know what's going on without the stream of NOPs.
>>
>> I don't think both IB and OOB are necessary, and it will be easier to build a correct solution if you choose just one.
>>
>> In both cases UUIDs would be good to ensure that all nodes have been accounted for. Counting is not particularly safe in a distributed environment.
>>
>> Best,
>> Matt
>>
>> On Aug 26, 2010, at 10:05 PM, Andrew Hume wrote:
>>
>>> i need some advice. i do not yet grok the feng shui of zeromq,
>>> and thus seek advice from those who do.
>>>
>>> i have a fairly normal setup similiar to the parallel pipeline example in teh guide.
>>> except that i have a handful of ventilators, and a handful of sinks.
>>> so far, so good. we just use the PUSH/PULL pattern.
>>>
>>> here is where it gets harder. i need to be able to essentially pause
>>> the ventilators, adjust the number of workers and sinks, and then
>>> unpause the ventilators WITHOUT losing any packets.
>>>
>>> the best (!?) solution i have so far is
>>>
>>> a) add a PUSH/PULL feedback socket (with all sinks and workers PUSH,
>>> and the master is a PULL)
>>> b) add a PUB/SUB command socket (with all ventilators, sinks and workers SUB,
>>> and the master PUB)
>>>
>>> c) we send an "IDLE" command to the ventilators; they pause their normal work
>>> and start sending NO-OP work items
>>> d) as each worker starts getting NO-OPs, they push a "LAZY" message to the master.
>>> they orward the NO-OP to the sinks.
>>> e) when the master sees k LAZY messages (where k is the existing number of workers),
>>> it rearranges teh workers (killing some or starting new ones). new workers send NO-OPs.
>>> f) when each sink starts getting NO-OPs, it sends a "LAZY" message to the master.
>>> g) when the master has done e), and seen NO-OPs from each of the j sinks, it
>>> rearranges the sinks. when each new sink starts getting NO-OPs, it send s a LAZY to teh master.
>>>
>>> h) when the master receives m "LAZY"s (where m is the number of new sinks), it send an "GO"
>>> command to teh ventilators, who then stop sending NO-OPs and start sending real work.
>>>
>>> -------------------------------------
>>>
>>> pros: i believe this scheme will work. and the additional cost of two sockets is modest.
>>> cons: it is tedious to send NO-OPs, but i don't know how else to flush teh buffers
>>> and synchronise everyone. it does involve knowing how many things there are,
>>> but that is part of an external configuration in any case.
>>>
>>> is this the (or a) right way to do this? is there a better way?
>>>
>>> andrew
>>>
>>> ------------------
>>> Andrew Hume (best -> Telework) +1 732-886-1886
>>> andrew at research.att.com (Work) +1 973-360-8651
>>> AT&T Labs - Research; member of USENIX and LOPSA
>>>
>>>
>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
> ------------------
> Andrew Hume (best -> Telework) +1 732-886-1886
> andrew at research.att.com (Work) +1 973-360-8651
> AT&T Labs - Research; member of USENIX and LOPSA
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20100827/93d3ab82/attachment.htm>
More information about the zeromq-dev
mailing list