[zeromq-dev] ZMQ_ROUTER modification

Andrew Hume andrew at research.att.com
Thu Dec 15 16:53:24 CET 2011


why is anyone disconnecting?
maybe i don't grok your setup.

to me, in a situation like this where there is a job-assigner
(it seems zmq folks like to use ventilator for some reason),
and worker-bees, then these entities are relatively long-lived,
and thus have stable permament control channels between them.

if your style is to fork off a process to do teh job, then the worker-bee
would do that. in this case, the process doing the work need not have any zmq
connections at all.

alternatively, if numbers don;t get goofily large, then each process finishing a job
can use its job-id and a well-known address to communicate back a short status.
that way, the job-assigner can do timeouts etc and redo jobs.

there are lots of ways to embellish this stuff.

	andrew

On Dec 15, 2011, at 8:10 AM, Whit Armstrong wrote:

> Thanks, Andrew.
> 
> I agree, the problem comes from zmq being so fast distributing the messages.
> 
> So, basically use a 'mama' worker instead of 'papa' for job scheduling.
> 
> I've been thinking about that, but I think there is a different
> problem with using mama's.  Assume that the client connects to several
> queue devices using a REP to which the workers connected via REQ.  We
> submit our job, which completes.  The problem arises if the workers
> are fast enough to do a send before the client can disconnect.  Since
> zmq is so fast, the disconnecting client socket can receive the
> 'please give me work' message that was meant for the next client who
> connects...
> 
> I'm sure there is an intelligent way to avoid this problem, but I
> haven't thought of it.
> 
> I'm thinking of a different design which I'll send under a separate email.
> 
> Thanks again, Andrew.
> 
> -Whit
> 
> 
> On Thu, Dec 15, 2011 at 9:27 AM, Andrew Hume <andrew at research.att.com> wrote:
>> whit,
>> 
>> i believe this is a common mistake, with an easy solution.
>> the fundamental error is confusing message distribution
>> with job scheduling. zeromq is partially to blame
>> because it does a good job at what it does (fair share
>> and load balancing) and tempts you into thinking it
>> solves the job scheduling problem as well.
>> 
>> in general, the best solution is that each worker
>> asks a job to do when it is ready for that work. typically,
>> we might use a REQ/REP for this. this works cleanly
>> if the request overhead is not significant (normally the case).
>> even when we get near the edge condition of the latency
>> becoming an issue, i normally solve that by keeping an internal
>> queue on the worker of 2-3 jobs (so that there is always something to do).
>> then, the only bad case is when the time to do a job is comparable
>> to teh time to transmit the job description. in this case, life is hard,
>> but generally in this case, volume is high, so you can afford to simply
>> batch jobs into groups (of 100 or somesuch) and then treat those
>> as a single managed unit.
>> 
>> andrew
>> 
>> 
>> On Dec 14, 2011, at 7:58 AM, Whit Armstrong wrote:
>> 
>> Well, let me explain what I'm trying to do.  Perhaps someone can show
>> me a better way.
>> 
>> I have a client using a dealer socket. Talking to a mixed server
>> environment,  a couple of 6 core machines and a 12 core machine.
>> 
>> Each of the servers uses a simple queue device to fan out the jobs to
>> the workers over ipc:
>> 
>> So, basically this pattern, but the client connects to many machines
>> w/ different numbers of cores.
>> 
>> client(DEALER)->Queue(ROUTER,DEALER)->worker(REP)
>> 
>> Because the dealer socket on the client fair queue's the messages to
>> all the queue devices equally, so the 12 core machine quickly becomes
>> idle after working off its queue while the 6 core machines continue
>> work off their queues.
>> 
>> My thought was that I could set the HWM to 1 on the ROUTER socket
>> which would prevent the messages from being read aggressively, but
>> since ROUTER will drop on HWM, I can't do that.
>> 
>> Can anyone suggest a better pattern?
>> 
>> -Whit
>> 
>> 
>> 
>> 
>> 
>> On Wed, Dec 14, 2011 at 3:35 AM, Martin Sustrik <sustrik at 250bpm.com> wrote:
>> 
>> On 12/14/2011 11:49 AM, Whit Armstrong wrote:
>> 
>> 
>> Is it possible to construct a ZMQ_ROUTER socket that does not drop on HWM?
>> 
>> 
>> 
>> Technically it is possible. It can block instead of dropping. The question
>> 
>> is whether single peer being dead/slow should really block sending messages
>> 
>> to all the other peers.
>> 
>> 
>> Martin
>> 
>> 
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> 
>> 
>> 
>> ------------------
>> Andrew Hume  (best -> Telework) +1 623-551-2845
>> andrew at research.att.com  (Work) +1 973-236-2014
>> AT&T Labs - Research; member of USENIX and LOPSA
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> 
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> 


------------------
Andrew Hume  (best -> Telework) +1 623-551-2845
andrew at research.att.com  (Work) +1 973-236-2014
AT&T Labs - Research; member of USENIX and LOPSA




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20111215/65980935/attachment.htm>


More information about the zeromq-dev mailing list