[zeromq-dev] Large messages and low latency

Brett Viren brett.viren at gmail.com
Wed Feb 10 20:25:43 CET 2021

Hi Francis,

In some past testing of my own I reproduced the results for 100 Gbps
shown in the wiki.  If you have not found them:


There is also an older 10Gbps test in the wiki which may be more
directly applicable.

I got 20 Gbps throughput in 100 Gbps (full duplex) bandwidth with no
special tuning and 25 Gbps after some various tunes.  CZMQ throughput
suffers relatively compared to libzmq at small (<10kB) and large (>1MB)
message size.

Based on these three results I'd guess you'd have no problem to fill 10
Gbps for message sizes bigger than a few kB.  I suggest checking the CPU
usage while running your test to see if your job is pegged and if so
maybe there is some application code doing "too much" in the code path
of message handling.  Or, if you aren't already doing so, try the
performance tests provided in libzmq repo.

Definitely there is a non-flat throughput-vs-message-size curve.  When
using libzmq directly, throughput increases roughly exponentially with
message size up to about 1kB/msg than then linear up to about 1
MByte/msg where it has a jagged plateau and then starts to droop at
about 20 MByte/msg.

To get anywhere close to filling 100 Gbps I had to either use multiple
sockets or multiple ZeroMQ I/O threads.  With the latter, one must
accept a loss of strict message ordering, as you found.

Also, one comment: in reality, latency and throughput are not inverses.
So, if you really want to tune for latency, measure it directly.  In my
latency studies I found the worse culprit was my own code that I naively
put in the message processing code path.  libzmq itself achieved 45-50
us latency for <1kByte/msg.  Adding just CZMQ gave 60-65us on small
messages.  As messages get larger then bandwidth begins to drive
latency.  BTW, adding my own "what harm could it be" code added many
100s of us to the latency test.


Francis Labonte <francis_labonte at hotmail.com> writes:

> Hi! 
> I am trying to use zmq for sending video frames.  The bandwidth I am using is 4Gbps up + 4Gbps down.  I am using push/pull
> sockets (2 pair – one for each direction).  So far, it is “working”, but I need to lower the latency.  - I have a 10Gbps
> link and cannot saturate it. 
> I tried to enable more io threads in hope it will increase the throughput and so reduce my latency…   However, for reason
> I don't understand yet, the messages  come unordered and/or seems broken.  So I might have misunderstood something!
>   ● I have 1 context (but I have tried with 2 without any impact)
>   ● I create 2 socket in their respective thread(send / receive) – they don’t know each other
>   ● I set 2 io threads
>   ● If I call “connect” 1x per socket -> I cannot send @ 10gbps ( I get about 10gbps total (up & down)).  Which seems in
>     line with the doc “~1GBps / working thread”
>   ● If I call connect 2x per socket, then the messages seems to be broken on receiver side
> I understand zmq is optimize for small messages/high volume, so that’s why I seek to understand if it possible to tune zmq
> for such a use case large message (10MB) that use all the bandwidth available (10gbps) to reduce latency?  Maybe it is
> possible an I am doing something wrong.. Can somebody help me with that? 
> Thanks for you help
> Francis
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 865 bytes
Desc: not available
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20210210/f34c51d6/attachment.sig>

More information about the zeromq-dev mailing list