[zeromq-dev] on scalability of PUB/SUB and PUSH/PULL

Goswin von Brederlow goswin-v-b at web.de
Thu Jun 12 10:03:19 CEST 2014

On Wed, Jun 11, 2014 at 02:05:13AM -0700, Jun Li wrote:
> Hi,
> I am using PUB/SUB socket pattern to distribute commands from the
> coordinator to the many worker processes, and I also have the PUSH/PULL to
> have each worker process to push the processing results to the coordinator.
> The coordinator is bound to the PUB socket and also the PULL socket, with
> the current context to set to 1 thread.   In my test environment, there
> would be one single coordinator process and up to 200 worker processes.
> I have just started the scalability testing. But it seems that with 15
> worker processes, the end-to-end communication latency is about 15 ms, for
> the coordinator to distribute (via PUB) the commands and finally aggregate
> the results back (via PULL) from the worker processes. But when I increased
> the number of worker processes to 50, I then observed the end-to-end
> communication latency of about 80 ms. This implies that as the number of
> the worker processes grow, the latency also grows and thus brings up the
> scalability issue.

You can hardly say anything with just to points. Is that a linear
increase? exponential? logarithmic? Does is jump between 49 and 50? 
Does it stay at 80ms up to 100000 workers?

> The message size communicated between the coordinator and the worker
> processes are not that big, less than 100 Bytes.
> While I am planning to measure the latency spent on each hop, I would like
> to seek suggestions:
> *for a large number of the worker processes to be handled by a single
> coordinator with low latency, should the context at the coordinator be set
> to >  1 thread?
> *Should I use the other socket pattern such as Router/Dealer, instead of
> pub/sub and push/pull, in order to address the scalability issue?
> Regards,
> Jun

Personally I think that if you depend on latency then you always have
a problem. That will be your bottleneck and seriously harm
scalability. You need to pipeline your work, send out more jobs ahead
of time while the workers are still busy with the last job. That way
the latency gets combletly absorbed and becomes irelevant.


More information about the zeromq-dev mailing list