[zeromq-dev] PUB/SUB unreliabiliity
Thomas Rodgers
rodgert at twrodgers.com
Tue Jun 17 14:01:24 CEST 2014
By debug/checked build, I meant build-time option.
I'll admit though it is a somewhat academic argument on my part. For
aziomq (https://github.com/aziomq/aziomq4-x) I've gone the route of making
options type safe, so the asserts would act as a back stop on me screwing
up the type parameters when I declared my option types.
See -
https://github.com/aziomq/aziomq4-x/blob/master/src/aziomq/option.hpp
On Tue, Jun 17, 2014 at 2:23 AM, Pieter Hintjens <ph at imatix.com> wrote:
> I think the actual evidence (I've seen two very expensive debug
> stories caused by this same behavior in the last few weeks) shows that
> returning an error in such a case is meaningless, and that a library
> asserting when passed bad arguments is measurably more robust, not
> less robust. I'm 100% sure of this. It's how CZMQ has worked since the
> start, including in the socket option classes, and no-one has ever
> flagged that as problematic. The only plausible use case is for tests,
> which is circular.
>
> However, changing existing behavior isn't allowed by our C4
> development contract, so I was thinking of making this optional via a
> build-time option in libzmq.
>
> -Pieter
>
> On Tue, Jun 17, 2014 at 1:43 AM, Thomas Rodgers <rodgert at twrodgers.com>
> wrote:
> > If so, debug/checked builds only please.
> >
> >
> >
> > On Mon, Jun 16, 2014 at 5:11 PM, Pieter Hintjens <ph at imatix.com> wrote:
> >>
> >> Indeed... I've had this hit more than once, a zmq_setsockopt that
> >> caused an error that wasn't being handled, with weird and expensive
> >> results down the line.
> >>
> >> Kind of makes you appreciate assertions more. CZMQ does this -- if a
> >> setsockopt fails for any reason except ETERM, it asserts. I might
> >> propose such a patch to libzmq.
> >>
> >> On Mon, Jun 16, 2014 at 8:45 PM, Gerry Steele <gerry.steele at gmail.com>
> >> wrote:
> >> > Thanks, there was also an error in my error handling thus why it was
> >> > never
> >> > flagged. I imagine its the same in my app code. uint64_t came from the
> >> > cli
> >> > argument handling lib thus why it was used over int. A lesson learned
> >> > there.
> >> >
> >> >
> >> >
> >> >
> >> > On 16 June 2014 19:13, Pieter Hintjens <ph at imatix.com> wrote:
> >> >>
> >> >> And indeed, this code prints "-1" as the return code:
> >> >>
> >> >> void *context = zmq_ctx_new ();
> >> >> void *publisher = zmq_socket (context, ZMQ_PUB);
> >> >> uint64_t rhwm = 0;
> >> >> int rc = zmq_setsockopt (publisher, ZMQ_SNDHWM, &rhwm, sizeof
> >> >> (rhwm));
> >> >> printf ("RC=%d\n", rc);
> >> >>
> >> >> -Pieter
> >> >>
> >> >> On Mon, Jun 16, 2014 at 8:03 PM, Pieter Hintjens <ph at imatix.com>
> wrote:
> >> >> > Hmm, it does check the size of the passed argument, and if that's
> >> >> > wrong, returns an error (which you do check for).
> >> >> >
> >> >> > On Mon, Jun 16, 2014 at 7:36 PM, Gerry Steele
> >> >> > <gerry.steele at gmail.com>
> >> >> > wrote:
> >> >> >> Hi Pieter, you have struck on something there.
> >> >> >>
> >> >> >> Converting it to int seems to yield the correct behaviour.
> >> >> >>
> >> >> >> I guess the way setsockopt works type coercion doesn't happen.
> >> >> >>
> >> >> >> Embarrassing! But at least we got to the bottom of it.
> >> >> >>
> >> >> >> I was able to send billions of events without incurring loss.
> >> >> >> Apologies
> >> >> >> for
> >> >> >> taking everyones time.
> >> >> >>
> >> >> >> Thanks all.
> >> >> >>
> >> >> >> g
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On 16 June 2014 18:22, Pieter Hintjens <ph at imatix.com> wrote:
> >> >> >>>
> >> >> >>> OK, just to double check, you're using ZeroMQ 4.0.x? In your test
> >> >> >>> case
> >> >> >>> (which I'm belatedly looking at), you use a uint64_t for the hwm
> >> >> >>> values; it should be int. Probably not significant.
> >> >> >>>
> >> >> >>> On Mon, Jun 16, 2014 at 6:20 PM, Gerry Steele
> >> >> >>> <gerry.steele at gmail.com>
> >> >> >>> wrote:
> >> >> >>> > In the patent email I have links to the minimal examples on
> >> >> >>> > gist.github.com
> >> >> >>> >
> >> >> >>> > Happy to open an issue and commit them later on if that's what
> >> >> >>> > you
> >> >> >>> > need.
> >> >> >>> >
> >> >> >>> > Thanks
> >> >> >>> >
> >> >> >>> > On 16 Jun 2014 14:43, "Pieter Hintjens" <ph at imatix.com> wrote:
> >> >> >>> >>
> >> >> >>> >> Gerry, can you provide a minimal test case that shows the
> >> >> >>> >> behavior?
> >> >> >>> >> Thanks.
> >> >> >>> >>
> >> >> >>> >> On Mon, Jun 16, 2014 at 12:49 PM, Gerry Steele
> >> >> >>> >> <gerry.steele at gmail.com>
> >> >> >>> >> wrote:
> >> >> >>> >> > Thanks Peter. I can't try this out till I get home but it is
> >> >> >>> >> > looking
> >> >> >>> >> > like
> >> >> >>> >> > hwm overflows.
> >> >> >>> >> >
> >> >> >>> >> > If you run the utilities you notice the drops start
> happening
> >> >> >>> >> > after
> >> >> >>> >> > precisely 1000 events in the first instance (which Is the
> >> >> >>> >> > default
> >> >> >>> >> > hwm).
> >> >> >>> >> >
> >> >> >>> >> > There was another largely ignored thread about this recently
> >> >> >>> >> > mentioning
> >> >> >>> >> > the
> >> >> >>> >> > same problem.
> >> >> >>> >> >
> >> >> >>> >> > I also tried setting the hwm values to a number greater than
> >> >> >>> >> > the
> >> >> >>> >> > number
> >> >> >>> >> > of
> >> >> >>> >> > events and it seemed to have no effect either.
> >> >> >>> >> >
> >> >> >>> >> > g
> >> >> >>> >> >
> >> >> >>> >> > On 16 Jun 2014 09:32, "Pieter Hintjens" <ph at imatix.com>
> wrote:
> >> >> >>> >> >>
> >> >> >>> >> >> On Mon, Jun 16, 2014 at 9:10 AM, Gerry Steele
> >> >> >>> >> >> <gerry.steele at gmail.com>
> >> >> >>> >> >> wrote:
> >> >> >>> >> >>
> >> >> >>> >> >> > Big chunks of messages go missing mid flow and then pick
> up
> >> >> >>> >> >> > again.
> >> >> >>> >> >> > There
> >> >> >>> >> >> > is
> >> >> >>> >> >> > no literature that indicates that is expected behaviour.
> >> >> >>> >> >>
> >> >> >>> >> >> Right. The two plausible causes for this are (a) HWM
> >> >> >>> >> >> overflows,
> >> >> >>> >> >> and
> >> >> >>> >> >> (b) temporary network disconnects. You have excluded (a),
> >> >> >>> >> >> though
> >> >> >>> >> >> to
> >> >> >>> >> >> be
> >> >> >>> >> >> paranoid I'd probably add some temporary logging to
> libzmq's
> >> >> >>> >> >> pub
> >> >> >>> >> >> socket to shout out if/when it does hit the HWM. To detect
> >> >> >>> >> >> (b)
> >> >> >>> >> >> you
> >> >> >>> >> >> could use the socket monitoring. The third possibility is
> >> >> >>> >> >> that
> >> >> >>> >> >> you're
> >> >> >>> >> >> doing something wrong with subscriptions... though that
> seems
> >> >> >>> >> >> unlikely.
> >> >> >>> >> >>
> >> >> >>> >> >> -Pieter
> >> >> >>> >> >> _______________________________________________
> >> >> >>> >> >> zeromq-dev mailing list
> >> >> >>> >> >> zeromq-dev at lists.zeromq.org
> >> >> >>> >> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >> >>> >> >
> >> >> >>> >> >
> >> >> >>> >> > _______________________________________________
> >> >> >>> >> > zeromq-dev mailing list
> >> >> >>> >> > zeromq-dev at lists.zeromq.org
> >> >> >>> >> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >> >>> >> >
> >> >> >>> >> _______________________________________________
> >> >> >>> >> zeromq-dev mailing list
> >> >> >>> >> zeromq-dev at lists.zeromq.org
> >> >> >>> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > _______________________________________________
> >> >> >>> > zeromq-dev mailing list
> >> >> >>> > zeromq-dev at lists.zeromq.org
> >> >> >>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >> >>> >
> >> >> >>> _______________________________________________
> >> >> >>> zeromq-dev mailing list
> >> >> >>> zeromq-dev at lists.zeromq.org
> >> >> >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Gerry Steele
> >> >> >>
> >> >> >>
> >> >> >> _______________________________________________
> >> >> >> zeromq-dev mailing list
> >> >> >> zeromq-dev at lists.zeromq.org
> >> >> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >> >>
> >> >> _______________________________________________
> >> >> zeromq-dev mailing list
> >> >> zeromq-dev at lists.zeromq.org
> >> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Gerry Steele
> >> >
> >> >
> >> > _______________________________________________
> >> > zeromq-dev mailing list
> >> > zeromq-dev at lists.zeromq.org
> >> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >
> >> _______________________________________________
> >> zeromq-dev mailing list
> >> zeromq-dev at lists.zeromq.org
> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >
> >
> >
> > _______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev at lists.zeromq.org
> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20140617/1a2cd2e9/attachment.htm>
More information about the zeromq-dev
mailing list