[zeromq-dev] PUB/SUB unreliabiliity

Thomas Rodgers rodgert at twrodgers.com
Tue Jun 17 01:43:37 CEST 2014


If so, debug/checked builds only please.



On Mon, Jun 16, 2014 at 5:11 PM, Pieter Hintjens <ph at imatix.com> wrote:

> Indeed... I've had this hit more than once, a zmq_setsockopt that
> caused an error that wasn't being handled, with weird and expensive
> results down the line.
>
> Kind of makes you appreciate assertions more. CZMQ does this -- if a
> setsockopt fails for any reason except ETERM, it asserts. I might
> propose such a patch to libzmq.
>
> On Mon, Jun 16, 2014 at 8:45 PM, Gerry Steele <gerry.steele at gmail.com>
> wrote:
> > Thanks, there was also an error in my error handling thus why it was
> never
> > flagged. I imagine its the same in my app code. uint64_t came from the
> cli
> > argument handling lib thus why it was used over int. A lesson learned
> there.
> >
> >
> >
> >
> > On 16 June 2014 19:13, Pieter Hintjens <ph at imatix.com> wrote:
> >>
> >> And indeed, this code prints "-1" as the return code:
> >>
> >>     void *context = zmq_ctx_new ();
> >>     void *publisher = zmq_socket (context, ZMQ_PUB);
> >>     uint64_t rhwm = 0;
> >>     int rc = zmq_setsockopt (publisher, ZMQ_SNDHWM, &rhwm, sizeof
> (rhwm));
> >>     printf ("RC=%d\n", rc);
> >>
> >> -Pieter
> >>
> >> On Mon, Jun 16, 2014 at 8:03 PM, Pieter Hintjens <ph at imatix.com> wrote:
> >> > Hmm, it does check the size of the passed argument, and if that's
> >> > wrong, returns an error (which you do check for).
> >> >
> >> > On Mon, Jun 16, 2014 at 7:36 PM, Gerry Steele <gerry.steele at gmail.com
> >
> >> > wrote:
> >> >> Hi Pieter, you have struck on something there.
> >> >>
> >> >> Converting it to int seems to yield the correct behaviour.
> >> >>
> >> >> I guess the way setsockopt works type coercion doesn't happen.
> >> >>
> >> >> Embarrassing! But at least we got to the bottom of it.
> >> >>
> >> >> I was able to send billions of events without incurring loss.
> Apologies
> >> >> for
> >> >> taking everyones time.
> >> >>
> >> >> Thanks all.
> >> >>
> >> >> g
> >> >>
> >> >>
> >> >>
> >> >> On 16 June 2014 18:22, Pieter Hintjens <ph at imatix.com> wrote:
> >> >>>
> >> >>> OK, just to double check, you're using ZeroMQ 4.0.x? In your test
> case
> >> >>> (which I'm belatedly looking at), you use a uint64_t for the hwm
> >> >>> values; it should be int. Probably not significant.
> >> >>>
> >> >>> On Mon, Jun 16, 2014 at 6:20 PM, Gerry Steele <
> gerry.steele at gmail.com>
> >> >>> wrote:
> >> >>> > In the patent email I have links to the minimal examples on
> >> >>> > gist.github.com
> >> >>> >
> >> >>> > Happy to open an issue and commit them later on if that's what you
> >> >>> > need.
> >> >>> >
> >> >>> > Thanks
> >> >>> >
> >> >>> > On 16 Jun 2014 14:43, "Pieter Hintjens" <ph at imatix.com> wrote:
> >> >>> >>
> >> >>> >> Gerry, can you provide a minimal test case that shows the
> behavior?
> >> >>> >> Thanks.
> >> >>> >>
> >> >>> >> On Mon, Jun 16, 2014 at 12:49 PM, Gerry Steele
> >> >>> >> <gerry.steele at gmail.com>
> >> >>> >> wrote:
> >> >>> >> > Thanks Peter. I can't try this out till I get home but it is
> >> >>> >> > looking
> >> >>> >> > like
> >> >>> >> > hwm overflows.
> >> >>> >> >
> >> >>> >> > If you run the utilities you notice the drops start happening
> >> >>> >> > after
> >> >>> >> > precisely 1000 events in the first instance (which Is the
> default
> >> >>> >> > hwm).
> >> >>> >> >
> >> >>> >> > There was another largely ignored thread about this recently
> >> >>> >> > mentioning
> >> >>> >> > the
> >> >>> >> > same problem.
> >> >>> >> >
> >> >>> >> > I also tried setting the hwm values to a number greater than
> the
> >> >>> >> > number
> >> >>> >> > of
> >> >>> >> > events and it seemed to have no effect either.
> >> >>> >> >
> >> >>> >> > g
> >> >>> >> >
> >> >>> >> > On 16 Jun 2014 09:32, "Pieter Hintjens" <ph at imatix.com> wrote:
> >> >>> >> >>
> >> >>> >> >> On Mon, Jun 16, 2014 at 9:10 AM, Gerry Steele
> >> >>> >> >> <gerry.steele at gmail.com>
> >> >>> >> >> wrote:
> >> >>> >> >>
> >> >>> >> >> > Big chunks of messages go missing mid flow and then pick up
> >> >>> >> >> > again.
> >> >>> >> >> > There
> >> >>> >> >> > is
> >> >>> >> >> > no literature that indicates that is expected behaviour.
> >> >>> >> >>
> >> >>> >> >> Right. The two plausible causes for this are (a) HWM
> overflows,
> >> >>> >> >> and
> >> >>> >> >> (b) temporary network disconnects. You have excluded (a),
> though
> >> >>> >> >> to
> >> >>> >> >> be
> >> >>> >> >> paranoid I'd probably add some temporary logging to libzmq's
> pub
> >> >>> >> >> socket to shout out if/when it does hit the HWM. To detect (b)
> >> >>> >> >> you
> >> >>> >> >> could use the socket monitoring.  The third possibility is
> that
> >> >>> >> >> you're
> >> >>> >> >> doing something wrong with subscriptions... though that seems
> >> >>> >> >> unlikely.
> >> >>> >> >>
> >> >>> >> >> -Pieter
> >> >>> >> >> _______________________________________________
> >> >>> >> >> zeromq-dev mailing list
> >> >>> >> >> zeromq-dev at lists.zeromq.org
> >> >>> >> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >>> >> >
> >> >>> >> >
> >> >>> >> > _______________________________________________
> >> >>> >> > zeromq-dev mailing list
> >> >>> >> > zeromq-dev at lists.zeromq.org
> >> >>> >> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >>> >> >
> >> >>> >> _______________________________________________
> >> >>> >> zeromq-dev mailing list
> >> >>> >> zeromq-dev at lists.zeromq.org
> >> >>> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >>> >
> >> >>> >
> >> >>> > _______________________________________________
> >> >>> > zeromq-dev mailing list
> >> >>> > zeromq-dev at lists.zeromq.org
> >> >>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >>> >
> >> >>> _______________________________________________
> >> >>> zeromq-dev mailing list
> >> >>> zeromq-dev at lists.zeromq.org
> >> >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Gerry Steele
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> zeromq-dev mailing list
> >> >> zeromq-dev at lists.zeromq.org
> >> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >>
> >> _______________________________________________
> >> zeromq-dev mailing list
> >> zeromq-dev at lists.zeromq.org
> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >
> >
> >
> >
> > --
> > Gerry Steele
> >
> >
> > _______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev at lists.zeromq.org
> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20140616/4f343e12/attachment.htm>


More information about the zeromq-dev mailing list