[zeromq-dev] PUB/SUB unreliabiliity
Gerry Steele
gerry.steele at gmail.com
Sat Jun 14 00:34:02 CEST 2014
Hi Brian
I noticed your comment on another thread about this and I think you got it
a bit wrong:
> The high water mark is a hard limit on the maximum number of outstanding
messages ØMQ shall queue in memory for any single peer that the specified
*socket* is communicating with.* A value of zero means no limit.*
and from your link:
> Since v3.x, ØMQ forces default limits on its internal buffers (the
so-called high-water mark or HWM), so publisher crashes are rarer *unless
you deliberately set the HWM to infinite.*
Nothing I read indicates anything other than the fact that no messages post
connections being made should be dropped.
Thanks
G
On 13 June 2014 23:17, Brian Knox <bknox at digitalocean.com> wrote:
> "From what i've read, PUB SUB should be reliable when the _HWM are set to
> zero (don't drop). By reliable I mean no messages should fail to be
> delivered to an already connected consumer."
>
>
> Your understanding of pub-sub behavior and how it interacts with the HWM
> is incorrect. Please see: http://zguide.zeromq.org/php:chapter5
>
> Brian
>
>
>
>
> On Fri, Jun 13, 2014 at 2:33 PM, Gerry Steele <gerry.steele at gmail.com>
> wrote:
>
>> I've read everything I can find including the Printed book, but I am at a
>> loss as to the definitive definition as to how PUB/SUB should behave in zmq.
>>
>> A production system I'm using is experiencing message loss between
>> several nodes using PUB/SUB.
>>
>> From what i've read, PUB SUB should be reliable when the _HWM are set to
>> zero (don't drop). By reliable I mean no messages should fail to be
>> delivered to an already connected consumer.
>>
>> I implemented some utilities to reproduce the message loss in my system :
>>
>> zmq_sub: https://gist.github.com/easytiger/992b3a29eb5c8545d289
>> zmq_pub: https://gist.github.com/easytiger/e382502badab49856357
>>
>>
>> zmq_pub takes a number of events to send and the logging frequency and
>> zmq_sub only takes the logging frequency. zmq prints out the number of msgs
>> received vs the packet contents containing the integer packet count from
>> the publisher.
>>
>> It can be seen when sending events in a tight loop that messages simply
>> go missing mid way through (loss is not at beginning or end ruling out slow
>> connectors etc)
>>
>> In a small loop it usually works ok:
>>
>> $ ./zmq_pub 2000 1000
>> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1000 with rc=58
>> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000 with rc=58
>>
>> $ ./zmq_sub 1
>>
>> RECV:1|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1
>> RECV:2|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2
>> RECV:3|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #3
>> RECV:4|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #4
>> [...]
>> RECV:2000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000
>>
>> You can see every message was sent as the counts align.
>>
>> However increase the message counts and messages start going missing
>>
>> $ ./zmq_pub 200000 100000
>>
>> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #100000 with
>> rc=60
>> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #200000 with
>> rc=60
>>
>> ./zmq_sub 10000
>> RECV:10000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #11000
>> RECV:20000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #21000
>> RECV:30000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #31610
>> RECV:40000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #42000
>> RECV:50000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #52524
>> RECV:60000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #64654
>> RECV:70000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #77298
>> RECV:80000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #90117
>> RECV:90000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #102864
>> RECV:100000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #115846
>> RECV:110000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #129135
>> RECV:120000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #141606
>> RECV:130000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #154179
>> RECV:140000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #166627
>> RECV:150000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #179166
>> RECV:160000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #192247
>>
>>
>> Is this expected behaviour? With PUSH/PULL I get no loss at all with
>> similar utilities.
>>
>> If I put more work between sends (e.g. cout each time) and the full
>> message the results are better.
>>
>> zmq_push: https://gist.github.com/easytiger/2c4f806594ccfbc74f54
>> zmq_pull: https://gist.github.com/easytiger/268a630fd22f959fde93
>>
>> Is there an issue/bug in my implementation that would cause this?
>>
>> Using zeromq 4.0.3
>>
>> Many Thanks
>> Gerry
>>
>>
>>
>>
>>
>> --
>> Gerry Steele
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
--
Gerry Steele
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20140613/fa12a54c/attachment.htm>
More information about the zeromq-dev
mailing list