[zeromq-dev] Measuring PUB/SUB performance on resource-constrained devices

J.S. Lischeid jsl71 at cam.ac.uk
Mon Apr 27 10:27:50 CEST 2020


Hi Brett,

Thank you for your thoughts. 
Looking at some benchmarking code has been very helpful, although I should have been more specific about my setup: I'm benchmarking the middleware, not raw ZMQ PUB/SUB connections; I'm just trying to use the fact that there's an underlying ZMQ queue to optimize my message sending pattern. 
Obviously, your general comments about benchmarking are still applicable and I will incorporate them into my setup.

Cheers,
Julius

-----Original Message-----
From: Brett Viren <brett.viren at gmail.com> 
Sent: Friday, 24 April 2020 16:51
To: J.S. Lischeid <jsl71 at cam.ac.uk>
Cc: zeromq-dev at lists.zeromq.org
Subject: Re: [zeromq-dev] Measuring PUB/SUB performance on resource-constrained devices

Hi Julius,

Some input from an interested ZeroMQ user:

There are some 10 and 100 GbE latency and throughput results in the wiki.  They focus on REQ/REP (lat) and PUSH/PULL (thr).  The benchmark and plotting code is in libzmq/perf/.  The "thr" uses PUSH/PULL and its code might be a good basis for a PUB/SUB variance.

For PUB/SUB I think the biggest feature to add would be to track dropped message rate during the test.  A PUB/SUB test will be very sensitive to whether the sender or the receiver is on faster hardware.  Here, an IoT sender to a receiver on a workstation is a helpful asymmetry.  Reversing the direction the workstation may easily send faster than RPi or similar will receive.

Secondary would be to add something to handle or account for "slow subscriber syndrome" (as per zguide).

Another problem I've had in my benchmarks and real apps is making sure that the sender stays alive after a stream of sends are done in order to give time for local send and remote recv buffers to be flushed and the time measures.  Best if the protocol assures this (eg, credit based flow
control) but with PUB/SUB that requires some additional socket patterns.
A simple approach is a "long enough" sleep just before sender termination and then do all the benchmark measurements on the receiver.

You may also want to perform benchmark measurements as a function of the number of PUBs and/or number of SUBs in the topology.

I suggest plotting throughput and loss rates as a function of the full parameter space as exhaustively as you have the patience for.  Searching for the max is, imo, not enough of the whole story.  Rather, a developer typically has an idea of a range of rates an application may require, or may want to know what's "safe" and design to that.  Seeing the big picture is very helpful.


You can take a look at some of the benchmark code I've messed with.
They are encumbered with various layers that may complicate using them directly but they may help at least to look at.

I have a cppzmq based PUB/SUB benchmark which is "encumbered" with own "ZIO" library layers:

  https://brettviren.github.io/zio/ex-distribution.html
  https://github.com/brettviren/zio/tree/master/test (check-pubsub* files)

And I wrote a CZMQ based benchmark program/library:

  https://github.com/brettviren/zperfmq

It should work with a variety of sockets including PUB/SUB.

But, one note, the CZMQ layer will add some small overhead compared to libzmq.  It's small compared to what my own apps layers add, even when I'm trying to make my layer fast/efficient.  To get a feeling for just how fast libzmq playing with the libzmq/perf/ tests was very valuable.

Making a simple PUB/SUB equivalent to the "thr" tests would be very useful and I think libzmq would benefit from having it.

And, now I see I missed that there is a libzmq/perf/proxy_thr.cpp already there.  It says it tests NODROP using XPUB/XSUB.  It could be a useful starting point to include a measure of message drops so PUB/SUB can be tested.

Whatever you end up doing, please report the results you find!

Cheers,
-Brett.

"J.S. Lischeid" <jsl71 at cam.ac.uk> writes:

> Dear ZeroMQ community,
>
> Are there any established throughput benchmarking practices for 
> PUB/SUB on resource-constrained devices that are possibly bottlenecked 
> by CPU/memory consumption instead of network bandwidth?
>
> I'm asking because I'm trying to benchmark an IoT messaging middleware 
> that uses ZMQ PUB/SUB queues under the hood. More specifically, I'm 
> trying to find the maximum theoretical throughput for given hardware 
> configurations (e.g. 256MB/512MB/1GB RAM, different CPU speeds, 
> network interfaces).
>
> These are my thoughts so far:
> - Ideally, you'd want to keep the publisher-side ZMQ-internal message 
> queue filled with a low number of messages throughout the benchmarking 
> interval. There's not enough memory on the devices to keep it filled 
> with a high number of larger messages (KB-MB range) but you'd also 
> want to avoid having an empty queue at any time since you're missing 
> out on send operations you could do in the meantime (for small 
> messages, there also might be batching advantages when having > 1 
> message in the queue).
> - ZMQ does not expose the internal queue fill level.
> - But just spamming a PUB socket with a low high water mark also 
> distorts measurements because it introduces middleware overhead for 
> messages that will not be sent eventually (probably especially 
> important on uniprocessors).
> - My currently favoured approach is performing a (binary) search for 
> the maximum number of messages that can be transferred in a given time 
> frame by evenly spacing out (small batches of) messages and sending 
> the producer thread to sleep in between.
>
> Do you have any thoughts on this or has someone here encountered a similar problem in the past?
>
> Thanks in advance!
>
> Julius
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev


More information about the zeromq-dev mailing list