[zeromq-dev] Distributed Q with **lots** of consumers

Sean Donovan sdonovan_uk at yahoo.com
Wed Nov 14 06:29:58 CET 2012

Oh boy, where to start . . . I’ve learnt
a lot in the last few days.

I don’t think it is a standard bus
pattern. Don’t laugh, but I had been
experimenting with ActiveMQ.  Eventually,
we’ll be scaling to millions of tasks daily.  We also need to prioritize messages (“real-time” traffic gets processed
over regular analytics).  We learnt that
ActiveMQ can’t efficiently handle that many consumers on one Q, also, the prioritization
algorithm seriously degrades past 60,000 messages in the Q, and the C++ client
(CMS) has bugs.
Given the # of consumers on one Q,
prioritization and millions of messages – I’m not even sure that other vendors
(IBM low-latency MQ, Tibco, etc.) would work.  We also have a requirement to be cross platform (Win32, Win64, Linux64,
SPARC32, SPARC64).  Our use-case is very specific, which is
why I started looking at ZeroMQ.  It
might be worth the effort to build something ourselves.  We write software to handle derivatives software and already have to support a plethora of 3rd party vendor products -- we'd like to avoid unnecessary licensing.
Persistence is interesting.  We might not need it.  If the system encounters an error, we simply
restart our analytics, and the tasks could get re-spooled.  Persistence of the results is already handled
in the distributed cache (GemFire).  My
problem is how we handle the spooling of millions of tasks to ZeroMQ.  We could put blocking semantics in if memory
is filled, or offload to disk.  So,
although it’s still persistence, we might not need to persist each
acknowledgement within a ZeroMQ type broker. 
To handle task prioritization, it should
be relatively easy to build a fairly decent (statistics based) algorithm into
the ZeroMQ client/feeder.

Poison messages are also interesting.  In our case, the biggest problem is receiving an unhandled exception during processing.  In that case, we log an error to the distributed cache -- and should restart the worker.  It's messier than it should be, am relying on millions of lines of code, some that goes back 15+ years.
Sorry if my description was messy -- and many thanks for the interest,

 From: Bennie Kloosteman <bklooste at gmail.com>
To: Sean Donovan <sdonovan_uk at yahoo.com>; ZeroMQ development list <zeromq-dev at lists.zeromq.org> 
Sent: Monday, November 12, 2012 11:48 PM
Subject: Re: [zeromq-dev] Distributed Q with **lots** of consumers

This is a fairly standard bus pattern , though personally i HATE persistent queues with  a passion  , they kill all performance , you have to manage the infrastructure and the possibility of poison messages - for most business cases there are better ways to guarantee delivery eg send update events to the client which can resend or terminate  as desired .    

If you run into a limit you can chain them , eg  20 servers with 20 workes each serve 400 clients , those 20 servers put everything into 1 queue. 

Why bother using a fast system like ZeroMQ with a persistent queue anyway , the queue will just bottle neck everything writing the message to disk and create contention on reading it back. Surely MSMQ or the other Service Bus queues can do this better  - since they have optimized persistence . 

On Tue, Nov 13, 2012 at 6:58 AM, Sean Donovan <sdonovan_uk at yahoo.com> wrote:

Any suggestions for implementing the following in ZMQ?
>Imagine a single Q containing millions of entries, which is constantly being added to.  This Q would be fully persistent, probably not managed by ZMQ, and run in it's own process.
>We would like N workers.  Those workers need to start/stop ad-hoc, and reconnect to the Q host process.  Each worker would take a single item from the Q, process, acknowledge completion, then repeat (to request another item).  Processing time for each task is 3ms+ (occasionally minutes).
>Because of the variance in compute time it is important that the workers don't pre-fetch/cache tasks.  As an optimization, we'll add a heuristic so we can batch short-running tasks together (but, we'd like the control -- a load-balancing algorithm wouldn't know how to route efficiently, unless it could take hints).
>Need a pattern that would allow us to scale to 100s of workers.  
>Sean Donovan
>zeromq-dev mailing list
>zeromq-dev at lists.zeromq.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20121113/9aeaed4e/attachment.htm>

More information about the zeromq-dev mailing list