[zeromq-dev] 0MQ vs TCP sockets - survey
Martin Sustrik
sustrik at 250bpm.com
Fri Oct 29 19:37:56 CEST 2010
Hi Kelly,
> 1. Lack of connect/disconnect notification. I know, talked about to
> death, but there are simply certain patterns where it is the
> cleanest/easiest and most useful way to approach notifying workers that
> their task master died or has become unavailable for some reason. Some
> reasons and ways I would use this knowledge:
>
> a) For the task master, I would start a single worker on the local
> machine and start processing. If results are not pending for say 1 second,
> bring up another worker on some other machine. (I use zeroconf to find
> other possible worker machines.) If any worker dies/disconnects,
> immediately recycle outstanding work items to other machines and potentially
> bring up another worker to replace the old one. (Down side, would need to
> forcibly disconnect and ignore any outstanding data on other sockets
> associated with the worker. Part of the reason I like shared sockets when
> head of the line is not an issue.)
>
> b) For a worker, any disconnection from the task master causes an
> immediate shutdown as there is no purpose to exist anymore. Again, because
> I use zeroconf and migrate work where ever their may be free CPU/memory,
> this just makes sense for my distribution model. I share a group of
> machines which all run a little controller daemon/service with other tools
> and processes; the controller advertises cpu/memory/disk/etc stats and a
> task master can pick the lowest utilized machine to start a new service on
> from that list. I don't want processes sitting around in memory if the task
> master dies.
>
My feeling is that the above has to do with cluster administration
(starting new nodes, restarting failed nodes, closing unused nodes). The
administration is pretty orthogonal to the actual workload. In other
words: There's no much point for the business logic processing
algorithms to care about cluster administration. It would just lead to
messy code. I would prefer a solution where the two concerns are neatly
separated.
> 2. Transacted messages or at least "acks" and knowledge or where it
> went. Eek, I know, don't like those at all. But, especially in the case of
> the way I'm using 0mq, it would make things "sooooooo" much easier since it
> is exactly what I need for this tool. I.e. I send out a work item, I "need"
> the result of the processing or it invalidates the entire run. I currently
> send the results back on a separate push channel which basically means that
> any worker that fails causes the task master to invalidate the entire run
> since it doesn't know what went where and how to recover and send out repeat
> requests. I "could" add some timeout logic, and probably will; it is just a
> case that this seems like something 0mq should support without user side
> work arounds.
>
Why not a req/rep with a timout? The need for acks seems unrelated to
the problem.
> 3. I would greatly prefer a reactor pattern over the current select
> like behavior when dealing with multiple sockets. I know there was some
> talk of this but I have not seen much further discussion lately.
>
I think there were some projects that added reactor pattern on top of
0MQ. Have you checked those?
Martin
More information about the zeromq-dev
mailing list