[zeromq-dev] content based routing for a thread pool

Lei Jiang lei.jiang.29 at gmail.com
Thu Nov 2 01:18:37 CET 2023


Hi Brett,

Thanks for the links to the documentation! I actually read them before but
should have read them more carefully. However, neither stated clearly what
the routing ID pattern is like. The man page says " one or more *routing id*
 parts, *delimiter* part, one or more *body parts*". I guess a simple chart
with each routing ID interleaved with an empty frame would make it really
easy to understand. The guide <http://zguide.wikidot.com/lua:chapter3> does
have that(fig 33, 34 and 35) just it takes a bit of time to digest.

Regarding the proxy() call marshalling should be light I totally agree.
Though the project I'm working on is just hobby, in my job industry there
are sometimes customers fussing about milli-second level delays. A long
multi-part message could potentially lead to embarrassing situations quite
easily. Even when the proxy code is fast, if something gets stuck in either
the request or the response, for example a slow DB query, a long delay can
be easily introduced. Here I must agree with you again that multi-part on
zmq_msg_t level is not a great idea. To have "multiplexing" capability it
would be better to implement a "session" layer on top of that with
start/end flags and sequence number etc. Regarding this, do you know how
reliable libzmq is? Will checksums, sequence numbers and acks be a waste
most of the time?

Another thing I like to say here is IMHO using empty frames as delimiter is
not a great idea, either. I did have extreme cases  when protobuf is
producing empty frames, though rarely in reality. Nevertheless, I think
mixing metadata with payload is against OO. Imagine there is a flag or type
we can get from zmq_msg_t to indicate if it's a meta data frame used by
libzmq, it could be much easier to separate the payload.

Cheers,
Lei

On Mon, Oct 30, 2023 at 11:58 PM Brett Viren <bv at bnl.gov> wrote:

> Lei Jiang <lei.jiang.29 at gmail.com> writes:
>
> > While putting together my code, I find it a bit hard to keep the
> implementation simple and efficient. For
> > example, the routing IDs are separated by empty frames as delimiters.
> According to some charts in the guide,
> > multiple IDs could be separated by multiple delimiters. This makes it
> impossible to identify message body
> > efficiently until all frames have been received
>
> This is a feature.  Multiple layers of routing can be accommodated by a
> sequence of these routing parts.    IIRC, the guide describes this as
> nested "envelopes".  The first ROUTER need only care about the first
> parts and should (must) leave any subsequent parts un-recv'ed.
>
> >  (BTW there does not seem to be a formal documentation on the
> > message formats?).
>
> The "ZMQ_ROUTER" section of the zmq_socket(3) man page documents the
> message schema that pertains to routing.  It is also defined by the
> REPREQ RFC:
>
>   https://rfc.zeromq.org/spec/28/
>
> > Also it seems the send() calls will close the message being sent,
> > which I find a bit odd, too.
>
> zmq_msg_send(3) man page states this behavior and suggests to copy the
> message object if you wish to send it multiple times.  (In general,
> ZeroMQ is very good at stating these details in man pages and RFCs).
>
> I don't know this internal detail myself but I have always assumed this
> is done so that the socket internals may avoid a copy in the more common
> case of sending a message to a single socket.
>
> > Another thing I find is that it's quite hard to make good use of
> features like zero-copy and multi-part
> > messages. When having hops, every hop will at least require an
> additional copy. For the multipart,
> > originally I thought I could use it for responses with unknown lengths.
> But then I realized because the
> > proxy code can't serve any other peer until current request or response
> is finished, it will for sure block
> > everyone else. The only remedy I can think of is for every packet to
> carry ID or routing data instead of
> > using the "more" flag. So in reality the best use of multipart is
> probably to carry different parts/fields
> > of requests/responses.
>
> Except for the requirements of ROUTER, the application is free to do
> what it wants in defining messages schema.
>
> In making such definitions one contends with how explicit or implicit to
> get.  Eg, should the message schema have some amount of self-description
> or are endpoints more trusted to "do the right thing".
>
> To give you a flavor, here is an example of an "opinionated" message
> format I've used in the past:
>
>   https://brettviren.github.io/zio/messages.html
>
> Warning: this format is NOT at all as well thought out as it should be.
> It also has some application-specific parts at levels that makes it not
> generally useful.
>
> > Lastly, regarding the proxy being single threaded, I think that's a
> > must? As the stable versions don't support accessing sockets from
> > different threads, the bridging code of any 2 sockets must run in a
> > single thread.
>
> Yes.
>
> But a "proxy" is typically not doing much work other than:
>
>   recv() -> [marshal] -> send()
>
> If the "[marshal]" code becomes so computationally expensive as to
> require multiple threads in order to keep up with the message rate from
> the proxy's clients then perhaps the architecture needs rethinking.
>
> For example, the "[marshal]" code can become a gateway to multi-threaded
> (or even multi-process) workers.  Eg, "[marshal]" could be made to be
> also a "client" in a MDP/PPP pattern.  Such "[marshal]" code would best
> to use a poller watching both its own client socket and the socket used
> to talk to the MDP/PPP "broker".  Then, the code inside the [marshal]'s
> main loop that is servicing that poller will become a rather fast path
> that can run in a single thread.
>
>
> -Brett.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20231102/622ca1a6/attachment.htm>


More information about the zeromq-dev mailing list