[zeromq-dev] Reliability question

Pieter Hintjens ph at imatix.com
Sat Jul 31 12:02:37 CEST 2010


On Fri, Jul 30, 2010 at 9:48 PM, Hannes Schmidt <hannes at eyealike.com> wrote:

> Do you agree that using a synchronization mechanism other than
> 0mq's request-response should work or does it have to involve 0mq?

Using any kind of thread synchronization except 0MQ sockets is bad
practice.  At the least you are mixing two tools when one would do.
But worse, you're creating structures that are language-specific, non
elastic, and would possibly interfere with each other in unpleasant
ways.

To synchronize two 0MQ tasks, use request-reply sockets.  Connect
these using whatever transport you need, and modify the transport as
you stretch your architecture across the network.

> Also, the Thread.sleep() in line 64 still makes me feel that there might be
> a race condition in 0mq. It may also be a conceptual problem. My guess is
> that 0mq only guarantees message delivery if the sender calls send() after
> receiver has invoked recv().

Indeed.  So the subscriber should confirm to the publisher that it is
ready only _after_ it has started to receive on the socket.  When you
want to use multiple sockets in one thread, that means using polling.

> If you do add documentation regarding this, it would be nice to document
> what exactly constitutes a successful subscription, i.e. whether it is the
> connect(), the s.setsockopt( ZMQ.SUBSCRIBE ) or the first call to recv().

None of these act as guaranteed successful subscriptions.  The
publisher may still bind and start to publish data at any point, and
there are scenarios where messages can get lost even if the subscriber
is waiting on recv().  E.g. client starts first, connects and sets a
filter and waits on recv().  Publisher then starts, binds, and starts
to publish.  Client asynchronously connects to publisher and then
starts receiving messages.  A non-zero time elapses between the
publisher doing its first send() and that socket registering the new
client connection.  Result is loss of any outgoing messages during
that time.

The most reliable way of dealing with this is to not care.  Literally,
to treat the data as an infinite stream with no start or end, and thus
nothing to lose.  From that perspective, you want to update
subscribers with new data so they fill in whatever gaps they have.

The second most reliable way (afaik but others may have more
knowledge) is to mix this infinite stream with OOB "catch up".  So a
subscriber starts to receive data, and _then_ goes off to grab a
snapshot of the state, using the timestamp of that first message it
gets, meanwhile queuing new messages and then applying them to its
snapshot.  More complex but foolproof.

> would also be nice if there were a description of what guarantees 0mq makes.
> Does it guarantee message delivery and ordering if the underlying transport
> does? Under what conditions?

Yes, this needs to be explicitly documented.

I hope this helps.  There are good reasons for 0MQ's way of working,
but it can be hard to fit with the way you're used to doing things.

-Pieter



More information about the zeromq-dev mailing list