[zeromq-dev] help with feng shui

Andrew Hume andrew at research.att.com
Fri Aug 27 16:01:06 CEST 2010


thanks! that was just the input i was after.

my intent is to do out-of-band signalling,
but because 0MQ doesn't provide clean startup/termination semantics,
and because of teh uncertainty caused by buffering, i had to simulate
one step of teh signalling by sending NO-OPs.

if i don't use NO-OPs, and purely use OOB signalling,
how do i know when a worker is done with its work?
how do i know when a ventilator's work messages have all been delivered?
and, if possible, the answer shouldn't contain any time-related waits.

On Aug 27, 2010, at 9:01 AM, Matt Weinstein wrote:

> IMO
>
> You're trying to get state control messages to flow through the  
> system, this method is a hybrid "in band" and "out of band" system.
>
> You probably should choose one or the other.
>
> OOB - You mirror the topology with a group of PUB/SUB sockets, top  
> to bottom
> IB -  you put an input at the top of the ventilators and send  
> inband messages downstream.  In this case it might be useful to  
> have signaling points (devices) that let local components know  
> what's going on without the stream of NOPs.
>
> I don't think both IB and OOB are necessary, and it will be easier  
> to build a correct solution if you choose just one.
>
> In both cases UUIDs would be good to ensure that all nodes have  
> been accounted for.  Counting is not particularly safe in a  
> distributed environment.
>
> Best,
> Matt
>
> On Aug 26, 2010, at 10:05 PM, Andrew Hume wrote:
>
>> i need some advice. i do not yet grok the feng shui of zeromq,
>> and thus seek advice from those who do.
>>
>> i have a fairly normal setup similiar to the parallel pipeline  
>> example in teh guide.
>> except that i have a handful of ventilators, and a handful of sinks.
>> so far, so good. we just use the PUSH/PULL pattern.
>>
>> here is where it gets harder. i need to be able to essentially pause
>> the ventilators, adjust the number of workers and sinks, and then
>> unpause the ventilators WITHOUT losing any packets.
>>
>> the best (!?) solution i have so far is
>>
>> a) add a PUSH/PULL feedback socket (with all sinks and workers PUSH,
>> and the master is a PULL)
>> b) add a PUB/SUB command socket (with all ventilators, sinks and  
>> workers SUB,
>> and the master PUB)
>>
>> c) we send an "IDLE" command to the ventilators; they pause their  
>> normal work
>> and start sending NO-OP work items
>> d) as each worker starts getting NO-OPs, they push a "LAZY"  
>> message to the master.
>> they orward the NO-OP to the sinks.
>> e) when the master sees k LAZY messages (where k is the existing  
>> number of workers),
>> it rearranges teh workers (killing some or starting new ones). new  
>> workers send NO-OPs.
>> f) when each sink starts getting NO-OPs, it sends a "LAZY" message  
>> to the master.
>> g) when the master has done e), and seen NO-OPs from each of the j  
>> sinks, it
>> rearranges the sinks. when each new sink starts getting NO-OPs, it  
>> send s a LAZY to teh master.
>>
>> h) when the master receives m "LAZY"s (where m is the number of  
>> new sinks), it send an "GO"
>> command to teh ventilators, who then stop sending NO-OPs and start  
>> sending real work.
>>
>> -------------------------------------
>>
>> pros: i believe this scheme will work. and the additional cost of  
>> two sockets is modest.
>> cons: it is tedious to send NO-OPs, but i don't know how else to  
>> flush teh buffers
>> and synchronise everyone. it does involve knowing how many things  
>> there are,
>> but that is part of an external configuration in any case.
>>
>> is this the (or a) right way to do this? is there a better way?
>>
>> 	andrew
>>
>> ------------------
>> Andrew Hume  (best -> Telework) +1 732-886-1886
>> andrew at research.att.com  (Work) +1 973-360-8651
>> AT&T Labs - Research; member of USENIX and LOPSA
>>
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev

------------------
Andrew Hume  (best -> Telework) +1 732-886-1886
andrew at research.att.com  (Work) +1 973-360-8651
AT&T Labs - Research; member of USENIX and LOPSA



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20100827/0f596c47/attachment.htm>


More information about the zeromq-dev mailing list