[zeromq-dev] PUB/SUB unreliabiliity

Brian Knox bknox at digitalocean.com
Sat Jun 14 00:17:49 CEST 2014


"From what i've read, PUB SUB should be reliable when the _HWM are set to
zero (don't drop). By reliable I mean no messages should fail to be
delivered to an already connected consumer."


Your understanding of pub-sub behavior and how  it interacts with the HWM
is incorrect.  Please see: http://zguide.zeromq.org/php:chapter5

Brian




On Fri, Jun 13, 2014 at 2:33 PM, Gerry Steele <gerry.steele at gmail.com>
wrote:

> I've read everything I can find including the Printed book, but I am at a
> loss as to the definitive definition as to how PUB/SUB should behave in zmq.
>
> A production system I'm using is experiencing message loss between several
> nodes using PUB/SUB.
>
> From what i've read, PUB SUB should be reliable when the _HWM are set to
> zero (don't drop). By reliable I mean no messages should fail to be
> delivered to an already connected consumer.
>
> I implemented some utilities to reproduce the message loss in my system :
>
> zmq_sub: https://gist.github.com/easytiger/992b3a29eb5c8545d289
> zmq_pub: https://gist.github.com/easytiger/e382502badab49856357
>
>
> zmq_pub takes a number of events to send and the logging frequency and
> zmq_sub only takes the logging frequency. zmq prints out the number of msgs
> received vs the packet contents containing the integer packet count from
> the publisher.
>
> It can be seen when sending events in a tight loop that messages simply go
> missing mid way through (loss is not at beginning or end ruling out slow
> connectors etc)
>
> In a small loop it usually works ok:
>
> $ ./zmq_pub 2000 1000
> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1000 with rc=58
> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000 with rc=58
>
> $ ./zmq_sub 1
>
> RECV:1|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1
> RECV:2|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2
> RECV:3|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #3
> RECV:4|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #4
> [...]
> RECV:2000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000
>
> You can see every message was sent as the counts align.
>
> However increase the message counts and messages start going missing
>
> $ ./zmq_pub 200000 100000
>
> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #100000 with
> rc=60
> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #200000 with
> rc=60
>
> ./zmq_sub 10000
> RECV:10000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #11000
> RECV:20000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #21000
> RECV:30000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #31610
> RECV:40000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #42000
> RECV:50000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #52524
> RECV:60000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #64654
> RECV:70000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #77298
> RECV:80000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #90117
> RECV:90000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #102864
> RECV:100000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #115846
> RECV:110000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #129135
> RECV:120000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #141606
> RECV:130000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #154179
> RECV:140000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #166627
> RECV:150000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #179166
> RECV:160000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #192247
>
>
> Is this expected behaviour? With PUSH/PULL I get no loss at all with
> similar utilities.
>
> If I put more work between sends (e.g. cout  each time) and the full
> message the results are better.
>
> zmq_push: https://gist.github.com/easytiger/2c4f806594ccfbc74f54
> zmq_pull:   https://gist.github.com/easytiger/268a630fd22f959fde93
>
> Is there an issue/bug in my implementation that would cause this?
>
> Using zeromq 4.0.3
>
> Many Thanks
> Gerry
>
>
>
>
>
> --
> Gerry Steele
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20140613/23c8c3ea/attachment.htm>


More information about the zeromq-dev mailing list