[zeromq-dev] PUB/SUB unreliabiliity

Gerry Steele gerry.steele at gmail.com
Sun Jun 15 13:23:39 CEST 2014


Hi Peter,

As per the code binds & connects are done after the SND_HWM and RCV_HWM are
set in both the publisher and subscriber.

Thanks
g


On 14 June 2014 23:37, Pieter Hintjens <ph at imatix.com> wrote:

> Are you setting the HWM to zero before doing any binds or connects, or
> after?
>
> Also, are you setting the HWM both at publisher and at subscriber, or
> at one side only?
>
> -Pieter
>
> On Fri, Jun 13, 2014 at 8:33 PM, Gerry Steele <gerry.steele at gmail.com>
> wrote:
> > I've read everything I can find including the Printed book, but I am at a
> > loss as to the definitive definition as to how PUB/SUB should behave in
> zmq.
> >
> > A production system I'm using is experiencing message loss between
> several
> > nodes using PUB/SUB.
> >
> > From what i've read, PUB SUB should be reliable when the _HWM are set to
> > zero (don't drop). By reliable I mean no messages should fail to be
> > delivered to an already connected consumer.
> >
> > I implemented some utilities to reproduce the message loss in my system :
> >
> > zmq_sub: https://gist.github.com/easytiger/992b3a29eb5c8545d289
> > zmq_pub: https://gist.github.com/easytiger/e382502badab49856357
> >
> >
> > zmq_pub takes a number of events to send and the logging frequency and
> > zmq_sub only takes the logging frequency. zmq prints out the number of
> msgs
> > received vs the packet contents containing the integer packet count from
> the
> > publisher.
> >
> > It can be seen when sending events in a tight loop that messages simply
> go
> > missing mid way through (loss is not at beginning or end ruling out slow
> > connectors etc)
> >
> > In a small loop it usually works ok:
> >
> > $ ./zmq_pub 2000 1000
> > sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1000 with
> rc=58
> > sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000 with
> rc=58
> >
> > $ ./zmq_sub 1
> >
> > RECV:1|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1
> > RECV:2|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2
> > RECV:3|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #3
> > RECV:4|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #4
> > [...]
> > RECV:2000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000
> >
> > You can see every message was sent as the counts align.
> >
> > However increase the message counts and messages start going missing
> >
> > $ ./zmq_pub 200000 100000
> > sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #100000 with
> rc=60
> > sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #200000 with
> rc=60
> >
> > ./zmq_sub 10000
> > RECV:10000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #11000
> > RECV:20000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #21000
> > RECV:30000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #31610
> > RECV:40000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #42000
> > RECV:50000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #52524
> > RECV:60000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #64654
> > RECV:70000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #77298
> > RECV:80000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #90117
> > RECV:90000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #102864
> > RECV:100000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #115846
> > RECV:110000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #129135
> > RECV:120000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #141606
> > RECV:130000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #154179
> > RECV:140000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #166627
> > RECV:150000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #179166
> > RECV:160000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #192247
> >
> >
> > Is this expected behaviour? With PUSH/PULL I get no loss at all with
> similar
> > utilities.
> >
> > If I put more work between sends (e.g. cout  each time) and the full
> message
> > the results are better.
> >
> > zmq_push: https://gist.github.com/easytiger/2c4f806594ccfbc74f54
> > zmq_pull:   https://gist.github.com/easytiger/268a630fd22f959fde93
> >
> > Is there an issue/bug in my implementation that would cause this?
> >
> > Using zeromq 4.0.3
> >
> > Many Thanks
> > Gerry
> >
> >
> >
> >
> >
> > --
> > Gerry Steele
> >
> >
> > _______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev at lists.zeromq.org
> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>



-- 
Gerry Steele
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20140615/8b014dd0/attachment.htm>


More information about the zeromq-dev mailing list