[zeromq-dev] Migrating IPC-based Pub/Sub System To ZMQ
mvyver at gmail.com
Mon Aug 16 05:23:26 CEST 2010
On Sat, Aug 14, 2010 at 9:32 PM, Pieter Hintjens <ph at imatix.com> wrote:
> On Fri, Aug 13, 2010 at 5:57 PM, Santy, Michael
> <Michael.Santy at dynetics.com> wrote:
>> I'm currently working to refactor an existing pub/sub distributed system
>> built around CMU IPC and raw sockets to solely use ZMQ. I've not been
>> able to fit it into any one of the common patterns, so I'd like to get some
>> feedback from the list on how to improve my proposed design.
> It's important to tell us what your actual use case is. It seems to be this:
>> Our existing system...
>> collects high-speed data from a number of sources, combines the data
>> centrally into processing packets, and farms out these processing packets
>> (mostly) round-robin to one of many data processors.
> So you have:
> * A set of sources producing high volume data streams
> * A set of combiners that turn these streams into workloads
> * A set of workers that turn these workloads into results
> * Control messages that flow between sources, combiners, and workers.
> There are key things you've not explained:
Some very useful information about the requirements is succinctly
conveyed by indicating which of C, A, and P (as in the CAP
theorem/conjecture) is required for which nodes (e.g. source,
combiner, etc.), sub-systems (e.g the admin/control system) or the
Pieter's suggestion to start by tolerating worker nodes coming and
going, might suggest that only P is required for worker nodes, but is
C required or desirable?
Apologies if that sounds like a stuck record...
> * How many sources are there, what are the data rates, and message sizes?
> * How many workers are there, what are the workload rates, and message sizes?
> * Are the workloads idempotent, i.e. can they be run more than once
> without risk?
> * What is the requirement for reliability, in terms of how much work
> can be lost?
> However, making some assumptions, the design you proposed seems the right one:
> * Sources each publish their data on a PUB socket.
> * A single combiner subscribes to all sources and offers a PUSH socket
> for workers.
> * Workers connect to the combiner via a PULL socket.
> * Sources, combiners, and workers all connect to a forwarder device.
> Any node can publish a control message to the forwarder, and all nodes
> will receive it, a simple multicast model.
> If the combiner crashes, work in progress will be lost. Workers will
> automatically reconnect when the combiner restarts. Sources will
> queue and then eventually discard data.
> As a first step you should prototype this in Python or similar, that
> will take a few days. It's actually a nice scenario and I might use
> it in the Guide as a worked example.
> There is probably a way to run multiple combiners in a redundant array
> but it gets complex and I'd aim rather for making all nodes more
> robust, and recovering quickly from any failure.
> -Pieter Hintjens
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
More information about the zeromq-dev