[zeromq-dev] Messages lost while push/pull sockets

Pieter Hintjens ph at imatix.com
Fri May 20 11:28:17 CEST 2011


On Fri, May 20, 2011 at 11:09 AM, Satyam Shekhar
<satyamshekhar at gmail.com> wrote:

> I did. Messages are lost when a downstream node crashes after it has started
> to receive messages. It's not a synchronisation problem.

Sorry, I didn't read your email fully, was hurrying to leave the office.

> I asked this on IRC as well. sustrik answered this.
> sustrik: it's normal, the messages queued at the downstream node are lost
> when the node crashes. Explicit acks should solve the crashing
> downstream node problem

If you want to handle the problem of crashing nodes, you should read
the Guide chapter 4, and look at the different reliability patterns.
It's not as simple as 'sending acks'... you need to detect the
failure, and recover in some way.

I've not yet covered reliable pipelines in the Guide so the patterns
you see are for request-reply. This can work for workload
distribution. However the simplest and most effective reliability
pattern for pipeline seems to be similar to "Lazy Pirate", whereby the
client resends the *entire* request if it doesn't get a proper
complete answer. It's inefficient but failure should be rare.

Meaning, you don't need acks. You distribute work, collect results,
and if any results are missing after your timeout, you assume a node
failed, and restart the whole process.

Please do explain more about your use case, and when you get this
working, the approach you used.

-Pieter



More information about the zeromq-dev mailing list