[zeromq-dev] content based routing for a thread pool

Brett Viren bv at bnl.gov
Thu Nov 2 14:39:48 CET 2023

Lei Jiang <lei.jiang.29 at gmail.com> writes:

> Regarding the proxy() call marshalling should be light I totally agree. Though the project I'm working on
> is just hobby, in my job industry there are sometimes customers
> fussing about milli-second level delays.

Even over tcp://, ZeroMQ latency is measured in tenths of milliseconds.
ipc:// should be at least as fast and inproc:// is much faster.

libzmq provides some benchmarking tools and there are some results on 10
GbE and 100 GbE in the wiki:


I've personally done some benchmarking that confirms these results on
100 GbE:


My conclusion is that libzmq is so fast that any non-trivial application
code will cause a bottleneck.  Even laying CZMQ on top adds a small
bottleneck.  Though CZMQ latency is still sub-ms.  Of course, once the
message size becomes large enough, available bandwidth will take its
toll.  At 100 GbE and default number of ZeroMQ "I/O" threads, the real
bandwidth is limited to 20-25 GbE.  This is not ZeroMQ specific as
non-ZeroMQ TCP transport also sees this.  In any case, on my test
network, messages had to reach ~1 MB in size to reach ~1ms latency.

> A long multi-part message could potentially lead to embarrassing
> situations quite easily. Even when the proxy code is fast, if
> something gets stuck in either the request or the response, for
> example a slow DB query, a long delay can be easily introduced.

I may not follow.

All parts of a message are transmitted atomically and asynchronously
from the point of view of the application.  This happens in "the
background" by ZeroMQ code running in its "I/O" threads.

So, certainly a proxy may take a "nap" in the middle of recv()'ing
message parts, but that does not impact the ongoing socket activity on
the transmission side.  And, a slow marshalling inside the proxy is not
an issue that relates to using multi-part messages as once the first
part is available the proxy application can recv() the remaining parts

It is true that if the proxy "naps" are required and they are long
enough then they may limit overall latency and/or throughput.  But,
again, that can be solved by having the proxy farm tasks out to workers
in their own threads, allowing the proxy to go back to marshaling for a
while.  Ie, the proxy becomes the broker in the MDP/PPP pattern.

> Regarding this, do you know how reliable libzmq is? Will checksums,
> sequence numbers and acks be a waste most of the time?

You will have to look into the ZMTP RFC to answer some of these
questions.  Transport is at least as reliable as the underlying
mechanism (tcp, inproc, ipc).  I have never ran across any checksum or
seqno that is exposed to the application.  It is common to adding these
to the application message schema.

Not reliability per se but messages will not be recv()'ed in the order
that they were send()'ed when zeromq is configured with more "I/O"
threads than the default.  The default will saturate about 20-25 Gbps in
a 100 GbE network in my tests.  Increasing the number of I/O threads is
required to go higher.  This is not a ZeroMQ specific problem as this
limit was seen with non-ZeroMQ perf testers.  It is also not an issue
with inproc and ipc.  

> Another thing I like to say here is IMHO using empty frames as
> delimiter is not a great idea, either. I did have extreme cases  when
> protobuf is producing empty frames, though rarely in reality.

This is only a problem if two issues collude.  The payload message part
must mimic a routing ID message part and it must arrive in the message
where a routing ID part is expected.  Once the first message part that
does not mimic a routing ID part is recv()'ed, the application knows
that all subsequent message parts are also payload type.  So, yes, if an
empty payload arrives at just the right point in the sequence of parts
the application will face ambiguity.

So, it is a good idea for the application message schema to define
something guaranteed to avoid the mimicry.  Ie, a "header" saying "from
now on, routing is over and you will only get payload".  This is one
intent of the "ZIO" string in the "magic" field in the message schema I

> Nevertheless, I think mixing metadata with payload is against
> OO. Imagine there is a flag or type we can get from zmq_msg_t to
> indicate if it's a meta data frame used by libzmq, it could be much
> easier to separate the payload.

The SERVER/CLIENT sockets provide exactly this.  They internally provide
bytes in the transmitted message to hold the routing ID and these bytes
are not exposed to the application as part of the message content.

But, one can have essentially the same thing with ROUTER by assuring the
payload message part can never mimic the explicit routing parts.

BTW, I explored this symmetry between ROUTER/DEALER and SERVER/CLIENT in
my "generaldomo" implementation of MDP.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 849 bytes
Desc: not available
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20231102/89330c9b/attachment.sig>

More information about the zeromq-dev mailing list