[zeromq-dev] content based routing for a thread pool

Brett Viren bv at bnl.gov
Mon Nov 13 15:49:01 CET 2023

Hi Lei,

Lei Jiang <lei.jiang.29 at gmail.com> writes:

> What I meant was, ZMQ seems to keep multipart message frames together. I wrote some test code, in the same
> thread, sending interleaving frames using 2 sockets like this.
> ZMQFrame("frame 1-1").Send(client1, ZMQ_SNDMORE);
> ZMQFrame("frame 2-1").Send(client2, ZMQ_SNDMORE);
> ZMQFrame("frame 1-2").Send(client1, ZMQ_SNDMORE);
> ZMQFrame("frame 2-2").Send(client2, ZMQ_SNDMORE);
> ZMQFrame("frame 1-3").Send(client1, ZMQ_SNDMORE);
> ZMQFrame("frame 2-3").Send(client2, ZMQ_SNDMORE);
> After my proxy code I got complete messages in the workers separately in the right order. I think it
> suggests that as long as the "active" message has not finished, meaning it's still sending the "MORE" flag,
> a ROUTER socket will not receive() a frame from a different client.

I believe this statement is true but not for the reason you suggest.

A sender socket will not transport (eg over tcp, ipc, etc) a full
message until all its parts have been supplied to the socket by the
application (eg, via calls to send()).  The socket buffers parts until
all have been given.

But even then, transport does not immediately follow.  The full message
must first traverse an "output queue" specific to the recipient
endpoint.  Actual transport is performed by the output I/O thread which
checks all output queues in the application.

This "fan out" pattern is reversed on the recipient side.  Each socket
has an input queue for each sender endpoint.  Messages are placed into
an input queue by the input I/O thread.  When recv() is called on a
receiving socket the socket picks (full) messages from one of the input

How messages are pushed to a socket's output queues or popped from input
queues are defined by the outgoing/incoming "routing strategy" described
in the zmq_socket(3) man page.

> Given that the proxy loop is single
> threaded, that means it won't tend to other clients until the current one is finished. If some of the
> message frames have delays, they will surely cause delays in the processing of frames from other clients as
> well.

Because full messages are transported atomically from the point of view
of the application, there are no "message frame delays", (there is of
course a full-message latency).

In other words, once an application does a recv() that returns a message
with the SNDMORE flag, that application can immediately recv() the
remaining parts.  Any delays will be at "memcpy speed".

> I think this means the multipart is not meant to be used to carry really long repeating parts but rather
> just one part of the message with different "fields".

It depends on the scale "really long".

The main limitation is RAM.  Because the sending socket will buffer of
the message parts prior to the message entering an output queue and vice
versa for the receiving socket, the full message must be small enough to
reside in available memory.  

> I have been reading the patterns in more detail recently. Some are so long I did not read carefully earlier.
> MDP is very cool but I realized what I did with my thread pool is actually a modified load balancing
> pattern. Now that I have done it, things look stupidly simple.

The best kind of simple! :)

> But here I have a new question.
> If I want to improve my threadpool to have better handling of queuing of the workers, also improved
> reliability by bringing in heartbeats, I will probably need to bring in the PPP. But the example in the
> guide seems to be having the proxy, ppqueue, pushing tasks to the workers. If I want to have the queue in
> the proxy, then I should have the workers "pulling" tasks. In the earlier part of the guide, it says this is
> achieved by having worker send a "READY" message to proxy and then proxy should reply with the task.
> So the core of my question here is, the worker can not know how long the task queue is in the proxy so it's
> possible queue is empty. In this case what should the proxy do? My guess is not replying to worker but do
> all the other things so that the worker should keep polling and that will be most efficient, right?


The clever "trick" about PPP/MDP is the usual "client/server"
conversation is reversed.  The worker makes a "request" to deliver the
last result or provides the initial "hi, I'm here, ready and willing".
Then the worker simply hangs idle waiting on a "reply" to deliver that
next task.

So, there is no neurotic loop of the worker constantly asking, "do you
have something for me?  No?  How about now?  now?  now?  now?".  Though,
heartbeats kind of provide a similar level of neurosis.

Thinking about this dichotomy now, I see the two approaches as the same
but different conversational "tone".  Traditional polling deals with
errors ("I failed to connect", "I failed to get a task") while PPP/MDP
deals with success ("here's your task", "here's a heartbeat saying all
is well").

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 849 bytes
Desc: not available
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20231113/e8d23599/attachment.sig>

More information about the zeromq-dev mailing list