[zeromq-dev] PUB/SUB unreliabiliity

Justin Karneges justin at affinix.com
Mon Jun 16 06:39:59 CEST 2014


Pubsub is by definition unreliable since messages are dropped if there 
are no subscribers.

An argument could be made that ZeroMQ ought to support a reliable 
reconnection for known subscribers, so that temporary disconnects 
between publisher and subscriber don't result in any lost messages. 
However, the key here is "temporary". If a subscriber remains 
disconnected for a very long time, then the question becomes how long 
should the publisher queue messages for a lost subscriber. And unless 
the answer is "for all time", then, well... you still have unreliability.

So, because subscribers may or may not exist at the time of publish, and 
because you'll never have an infinite queue, it's best to just assume 
that pubsub isn't reliable. Build reliability around it.

Some philosophy:
http://zguide.zeromq.org/page:all#Pros-and-Cons-of-Pub-Sub

On 06/15/2014 04:43 AM, Gerry Steele wrote:
> Thanks Charles, that's pretty much my understanding too. Meaning this is
> a bug in my implementation or in zeromq.
>
> I understand the implications of the slow consumer problem but the
> fundamental issue here is to establish trust in PUB/SUB.
>
>
> On 14 June 2014 21:09, Charles Remes <lists at chuckremes.com
> <mailto:lists at chuckremes.com>> wrote:
>
>     Let’s back up for a second.
>
>     Take a look at the man page for zmq_setsockopt and read the section
>     on ZMQ_SNDHWM. It clearly states that zero means “no limit.” Second,
>     it also states that when the socket reaches its exceptional state
>     then it will either block or drop messages depending on socket type.
>
>     Next, look at the man page for zmq_socket and check the ZMQ_PUB
>     section. The socket will reach its mute state (its exceptional
>     state) when it reaches it high water mark. When it’s mute, it will
>     drop messages.
>
>     So, taking the two together then a socket with a ZMQ_SNDHWM of 0
>     should never drop messages because it will never reach its mute state.
>
>     The one exception to this is when there are no SUB sockets connected
>     to the PUB socket. When there are no connections, all messages are
>     dropped (because no one is listening and there are no queues created).
>
>     However, I highly recommend *against* setting HWM to 0 for a PUB
>     socket. Here’s why:
>
>     1. It gives you a false sense of security that all messages will be
>     delivered.
>     If the publishing process dies, any messages in queue go with it so
>     they’ll never get delivered.
>
>     2. Your subscribers might be too slow.
>     If your subscribers can’t keep up with the message flow and the
>     publisher starts queueing, it *will* run out of memory. You’ll
>     either exhaust the amount of memory allowed by your process, or your
>     OS will start paging & swapping and you’ll wish the process had just
>     died.
>
>     cr
>
>
>     On Jun 13, 2014, at 5:34 PM, Gerry Steele <gerry.steele at gmail.com
>     <mailto:gerry.steele at gmail.com>> wrote:
>
>>     Hi Brian
>>
>>     I noticed your comment on another thread about this and I think
>>     you got it a bit wrong:
>>
>>     > The high water mark is a hard limit on the maximum number of
>>     outstanding messages ØMQ shall queue in memory for any single peer
>>     that the specified/socket/is communicating with.*A value of zero
>>     means no limit.*
>>     *
>>     *
>>     and from your link:
>>
>>     > Since v3.x, ØMQ forces default limits on its internal buffers
>>     (the so-called high-water mark or HWM), so publisher crashes are
>>     rarer *unless you deliberately set the HWM to infinite.*
>>
>>     Nothing I read indicates anything other than the fact that no
>>     messages post connections being made should be dropped.
>>
>>     Thanks
>>     G
>>
>>
>>
>>     On 13 June 2014 23:17, Brian Knox <bknox at digitalocean.com
>>     <mailto:bknox at digitalocean.com>> wrote:
>>
>>         "From what i've read, PUB SUB should be reliable when the _HWM
>>         are set to zero (don't drop). By reliable I mean no messages
>>         should fail to be delivered to an already connected consumer."
>>
>>
>>         Your understanding of pub-sub behavior and how  it interacts
>>         with the HWM is incorrect.  Please see:
>>         http://zguide.zeromq.org/php:chapter5
>>
>>         Brian
>>
>>
>>
>>
>>         On Fri, Jun 13, 2014 at 2:33 PM, Gerry Steele
>>         <gerry.steele at gmail.com <mailto:gerry.steele at gmail.com>> wrote:
>>
>>             I've read everything I can find including the Printed
>>             book, but I am at a loss as to the definitive definition
>>             as to how PUB/SUB should behave in zmq.
>>
>>             A production system I'm using is experiencing message loss
>>             between several nodes using PUB/SUB.
>>
>>             From what i've read, PUB SUB should be reliable when the
>>             _HWM are set to zero (don't drop). By reliable I mean no
>>             messages should fail to be delivered to an already
>>             connected consumer.
>>
>>             I implemented some utilities to reproduce the message loss
>>             in my system :
>>
>>             zmq_sub:
>>             https://gist.github.com/easytiger/992b3a29eb5c8545d289
>>             zmq_pub:
>>             https://gist.github.com/easytiger/e382502badab49856357
>>
>>
>>             zmq_pub takes a number of events to send and the logging
>>             frequency and zmq_sub only takes the logging frequency.
>>             zmq prints out the number of msgs received vs the packet
>>             contents containing the integer packet count from the
>>             publisher.
>>
>>             It can be seen when sending events in a tight loop that
>>             messages simply go missing mid way through (loss is not at
>>             beginning or end ruling out slow connectors etc)
>>
>>             In a small loop it usually works ok:
>>
>>             $ ./zmq_pub 2000 1000
>>             sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH
>>             #1000 with rc=58
>>             sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH
>>             #2000 with rc=58
>>
>>             $ ./zmq_sub 1
>>
>>             RECV:1|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1
>>             RECV:2|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2
>>             RECV:3|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #3
>>             RECV:4|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #4
>>             [...]
>>             RECV:2000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND
>>             SUCH #2000
>>
>>             You can see every message was sent as the counts align.
>>
>>             However increase the message counts and messages start
>>             going missing
>>
>>             $ ./zmq_pub 200000 100000
>>             sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH
>>             #100000 with rc=60
>>             sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH
>>             #200000 with rc=60
>>
>>             ./zmq_sub 10000
>>             RECV:10000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND
>>             SUCH #11000
>>             RECV:20000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND
>>             SUCH #21000
>>             RECV:30000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND
>>             SUCH #31610
>>             RECV:40000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND
>>             SUCH #42000
>>             RECV:50000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND
>>             SUCH #52524
>>             RECV:60000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND
>>             SUCH #64654
>>             RECV:70000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND
>>             SUCH #77298
>>             RECV:80000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND
>>             SUCH #90117
>>             RECV:90000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND
>>             SUCH #102864
>>             RECV:100000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF
>>             AND SUCH #115846
>>             RECV:110000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF
>>             AND SUCH #129135
>>             RECV:120000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF
>>             AND SUCH #141606
>>             RECV:130000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF
>>             AND SUCH #154179
>>             RECV:140000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF
>>             AND SUCH #166627
>>             RECV:150000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF
>>             AND SUCH #179166
>>             RECV:160000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF
>>             AND SUCH #192247
>>
>>
>>             Is this expected behaviour? With PUSH/PULL I get no loss
>>             at all with similar utilities.
>>
>>             If I put more work between sends (e.g. cout  each time)
>>             and the full message the results are better.
>>
>>             zmq_push:
>>             https://gist.github.com/easytiger/2c4f806594ccfbc74f54
>>             zmq_pull:
>>             https://gist.github.com/easytiger/268a630fd22f959fde93
>>
>>             Is there an issue/bug in my implementation that would
>>             cause this?
>>
>>             Using zeromq 4.0.3
>>
>>             Many Thanks
>>             Gerry
>>
>>
>>
>>
>>
>>             --
>>             Gerry Steele
>>
>>
>>             _______________________________________________
>>             zeromq-dev mailing list
>>             zeromq-dev at lists.zeromq.org
>>             <mailto:zeromq-dev at lists.zeromq.org>
>>             http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>
>>         _______________________________________________
>>         zeromq-dev mailing list
>>         zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>>         http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>
>>
>>     --
>>     Gerry Steele
>>
>>     _______________________________________________
>>     zeromq-dev mailing list
>>     zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>>     http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>     _______________________________________________
>     zeromq-dev mailing list
>     zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>     http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
>
> --
> Gerry Steele
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>




More information about the zeromq-dev mailing list