[zeromq-dev] ZMQ_ROUTER modification
Andrew Hume
andrew at research.att.com
Thu Dec 15 16:53:24 CET 2011
why is anyone disconnecting?
maybe i don't grok your setup.
to me, in a situation like this where there is a job-assigner
(it seems zmq folks like to use ventilator for some reason),
and worker-bees, then these entities are relatively long-lived,
and thus have stable permament control channels between them.
if your style is to fork off a process to do teh job, then the worker-bee
would do that. in this case, the process doing the work need not have any zmq
connections at all.
alternatively, if numbers don;t get goofily large, then each process finishing a job
can use its job-id and a well-known address to communicate back a short status.
that way, the job-assigner can do timeouts etc and redo jobs.
there are lots of ways to embellish this stuff.
andrew
On Dec 15, 2011, at 8:10 AM, Whit Armstrong wrote:
> Thanks, Andrew.
>
> I agree, the problem comes from zmq being so fast distributing the messages.
>
> So, basically use a 'mama' worker instead of 'papa' for job scheduling.
>
> I've been thinking about that, but I think there is a different
> problem with using mama's. Assume that the client connects to several
> queue devices using a REP to which the workers connected via REQ. We
> submit our job, which completes. The problem arises if the workers
> are fast enough to do a send before the client can disconnect. Since
> zmq is so fast, the disconnecting client socket can receive the
> 'please give me work' message that was meant for the next client who
> connects...
>
> I'm sure there is an intelligent way to avoid this problem, but I
> haven't thought of it.
>
> I'm thinking of a different design which I'll send under a separate email.
>
> Thanks again, Andrew.
>
> -Whit
>
>
> On Thu, Dec 15, 2011 at 9:27 AM, Andrew Hume <andrew at research.att.com> wrote:
>> whit,
>>
>> i believe this is a common mistake, with an easy solution.
>> the fundamental error is confusing message distribution
>> with job scheduling. zeromq is partially to blame
>> because it does a good job at what it does (fair share
>> and load balancing) and tempts you into thinking it
>> solves the job scheduling problem as well.
>>
>> in general, the best solution is that each worker
>> asks a job to do when it is ready for that work. typically,
>> we might use a REQ/REP for this. this works cleanly
>> if the request overhead is not significant (normally the case).
>> even when we get near the edge condition of the latency
>> becoming an issue, i normally solve that by keeping an internal
>> queue on the worker of 2-3 jobs (so that there is always something to do).
>> then, the only bad case is when the time to do a job is comparable
>> to teh time to transmit the job description. in this case, life is hard,
>> but generally in this case, volume is high, so you can afford to simply
>> batch jobs into groups (of 100 or somesuch) and then treat those
>> as a single managed unit.
>>
>> andrew
>>
>>
>> On Dec 14, 2011, at 7:58 AM, Whit Armstrong wrote:
>>
>> Well, let me explain what I'm trying to do. Perhaps someone can show
>> me a better way.
>>
>> I have a client using a dealer socket. Talking to a mixed server
>> environment, a couple of 6 core machines and a 12 core machine.
>>
>> Each of the servers uses a simple queue device to fan out the jobs to
>> the workers over ipc:
>>
>> So, basically this pattern, but the client connects to many machines
>> w/ different numbers of cores.
>>
>> client(DEALER)->Queue(ROUTER,DEALER)->worker(REP)
>>
>> Because the dealer socket on the client fair queue's the messages to
>> all the queue devices equally, so the 12 core machine quickly becomes
>> idle after working off its queue while the 6 core machines continue
>> work off their queues.
>>
>> My thought was that I could set the HWM to 1 on the ROUTER socket
>> which would prevent the messages from being read aggressively, but
>> since ROUTER will drop on HWM, I can't do that.
>>
>> Can anyone suggest a better pattern?
>>
>> -Whit
>>
>>
>>
>>
>>
>> On Wed, Dec 14, 2011 at 3:35 AM, Martin Sustrik <sustrik at 250bpm.com> wrote:
>>
>> On 12/14/2011 11:49 AM, Whit Armstrong wrote:
>>
>>
>> Is it possible to construct a ZMQ_ROUTER socket that does not drop on HWM?
>>
>>
>>
>> Technically it is possible. It can block instead of dropping. The question
>>
>> is whether single peer being dead/slow should really block sending messages
>>
>> to all the other peers.
>>
>>
>> Martin
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>
>> ------------------
>> Andrew Hume (best -> Telework) +1 623-551-2845
>> andrew at research.att.com (Work) +1 973-236-2014
>> AT&T Labs - Research; member of USENIX and LOPSA
>>
>>
>>
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
------------------
Andrew Hume (best -> Telework) +1 623-551-2845
andrew at research.att.com (Work) +1 973-236-2014
AT&T Labs - Research; member of USENIX and LOPSA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20111215/65980935/attachment.htm>
More information about the zeromq-dev
mailing list