[zeromq-dev] Reliability question
Hannes Schmidt
hannes at eyealike.com
Mon Aug 2 01:29:37 CEST 2010
On Sat, Jul 31, 2010 at 3:02 AM, Pieter Hintjens <ph at imatix.com> wrote:
> On Fri, Jul 30, 2010 at 9:48 PM, Hannes Schmidt <hannes at eyealike.com>
> wrote:
>
> > Do you agree that using a synchronization mechanism other than
> > 0mq's request-response should work or does it have to involve 0mq?
>
> Using any kind of thread synchronization except 0MQ sockets is bad
> practice. At the least you are mixing two tools when one would do.
> But worse, you're creating structures that are language-specific, non
> elastic, and would possibly interfere with each other in unpleasant
> ways.
>
> To synchronize two 0MQ tasks, use request-reply sockets. Connect
> these using whatever transport you need, and modify the transport as
> you stretch your architecture across the network.
>
> > Also, the Thread.sleep() in line 64 still makes me feel that there might
> be
> > a race condition in 0mq. It may also be a conceptual problem. My guess is
> > that 0mq only guarantees message delivery if the sender calls send()
> after
> > receiver has invoked recv().
>
> Indeed. So the subscriber should confirm to the publisher that it is
> ready only _after_ it has started to receive on the socket. When you
> want to use multiple sockets in one thread, that means using polling.
>
> > If you do add documentation regarding this, it would be nice to document
> > what exactly constitutes a successful subscription, i.e. whether it is
> the
> > connect(), the s.setsockopt( ZMQ.SUBSCRIBE ) or the first call to recv().
>
> None of these act as guaranteed successful subscriptions. The
> publisher may still bind and start to publish data at any point, and
> there are scenarios where messages can get lost even if the subscriber
> is waiting on recv(). E.g. client starts first, connects and sets a
> filter and waits on recv(). Publisher then starts, binds, and starts
> to publish. Client asynchronously connects to publisher and then
> starts receiving messages. A non-zero time elapses between the
> publisher doing its first send() and that socket registering the new
> client connection. Result is loss of any outgoing messages during
> that time.
>
> The most reliable way of dealing with this is to not care. Literally,
> to treat the data as an infinite stream with no start or end, and thus
> nothing to lose. From that perspective, you want to update
> subscribers with new data so they fill in whatever gaps they have.
>
> The second most reliable way (afaik but others may have more
> knowledge) is to mix this infinite stream with OOB "catch up". So a
> subscriber starts to receive data, and _then_ goes off to grab a
> snapshot of the state, using the timestamp of that first message it
> gets, meanwhile queuing new messages and then applying them to its
> snapshot. More complex but foolproof.
>
> > would also be nice if there were a description of what guarantees 0mq
> makes.
> > Does it guarantee message delivery and ordering if the underlying
> transport
> > does? Under what conditions?
>
> Yes, this needs to be explicitly documented.
>
> I hope this helps. There are good reasons for 0MQ's way of working,
> but it can be hard to fit with the way you're used to doing things.
>
> -Pieter
>
Ok, I got it. I was erroneously working under the assumption that pub-sub in
0MQ is reliable. This assumption was not justified because the documentation
did not say 0mq was reliable. That said, and as you already mentioned, it
might help users to explicitly state that fact. Many messaging systems claim
that they are reliable and users might initially expect reliability from
every messaging system. Interestingly, the systems I have come
across implement reliability either at great cost or simply reduce the
chance of failure somewhat. After having read your suggestions, I started
reading a book on this matter. It seems that ordered and reliable multicast
messaging can and should be implemented on top of an unreliable, non-ordered
implementation. I say 'should' because doing it that way nicely separates
concerns and gives users a chance to pick and choose from different
strengths of ordering and reliability constraints.
So thanks again for you responses, Pieter. It helped me straighten out my
thoughts a lot.
-- Hannes
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20100801/55a3617a/attachment.htm>
More information about the zeromq-dev
mailing list