[zeromq-dev] exiting without missing messages

Mathijs Kwik bluescreen303 at gmail.com
Sat Aug 20 22:49:54 CEST 2011


On 20 aug, 19:06, Pieter Hintjens <p... at imatix.com> wrote:
> On Sat, Aug 20, 2011 at 5:57 PM, Mathijs Kwik <bluescreen... at gmail.com> wrote:
> > I fail to see how to implement these explicit "bye bye" messages.
> > I can get a worker to send such a message to the ventilator, I just
> > don't have a clue on what the ventilator should do when it receives a
> > byebye.
>
> Study the LRU queue pattern and code, it's the basis for all custom
> routing. The ventilator (ROUTER) keeps a table of all active workers.
> When a worker decides to go away, it tells the ventilator, which
> removes it from its table.
>
> IMO you'll find alternative patterns that work better, such as
> prioritising workers and sending to certain ahead of others, while
> keeping them all connected all the time.
>
> The best approach IMO is to start with the LRU queue, learn it, and
> then extend it step by step until you have what works for you.
>
> -Pieter
> _______________________________________________
> zeromq-dev mailing list
> zeromq-... at lists.zeromq.orghttp://lists.zeromq.org/mailman/listinfo/zeromq-dev

That indeed looks like a solution.

However, in the LRU example, routing/load balancing is lifted from
zeromq into the application itself.
On top of that, if you want to be able to respond to workers dying,
you have to write some heartbeat code (to find out if workers die)
with timeout handling. And some caches for "outstanding" messages,
tracking their delivery and the ability to resend. This type of code
can be hard to get right at once (and gives subtle, hard to catch
problems if you don't).

Basically you end up replicating big parts of functionality that
normally get provided "behind the scenes" in push/pull sockets, just
because there's no way to close a socket without losing data.

In my other question I posted here, I also learned that - although
zeromq handles reconnection and persistence - internal state (should I
send/receive next on this REQ socket?) doesn't get synchronized on
reconnect. Application code doesn't hear about the reconnection,
forcing the use of (arbitrary) timeouts and cleaning up yourself in
the application.

Writing all that routing/problem detection/message tracking code seems
to take the usefulness (and fun) out of zeromq.
If the application has to provide all those things by itself, I think
it can just as easily be applied to plain tcp/unix sockets (maybe add
some framing code then). I'm pretty sure those even provide some
feedback about connection issues that makes it easier to detect
failure and respond appropriately.

I'm sorry if I sound too negative here. I still like zeromq for simple
cases where just using a messaging pattern on top of a transport
suffices without worrying about failures/restarts/graceful shutdowns.
Also, using zeromq's event loop as a "reactor core" to build simple,
elegant MT code around is probably a very huge benefit for most
languages/frameworks.

However, I mainly use zeromq from node.js, ruby's eventmachine, and
haskell though, which all have their own event loops.
And in my opinion, for cases that are a little more demanding, zeromq
seems to get in the way by abstracting just too much. Without a way to
list/disconnect the current connections of a socket, or a way to get
notified of disconnects/delivery problems (buffers filling up), you
end up building your own transport layer on top of the most basic XREP/
XREQ sockets.

Again, this isn't meant to troll/flamebait in any way, I really
appreciate the help and pointers and the guide is very clear (and
grown a lot since I last read it!). So maybe I'm overreacting and just
need to reread chapter4+ many more times and play with the examples,
but for some reason that feels like a lot less fun than the magic that
zeromq seemed to provide when I started with the first few chapters.

Anyway, have a great weekend and keep up the good work!
Mathijs



More information about the zeromq-dev mailing list