[zeromq-dev] PUB/SUB unreliabiliity

Pieter Hintjens ph at imatix.com
Tue Jun 17 09:23:20 CEST 2014


I think the actual evidence (I've seen two very expensive debug
stories caused by this same behavior in the last few weeks) shows that
returning an error in such a case is meaningless, and that a library
asserting when passed bad arguments is measurably more robust, not
less robust. I'm 100% sure of this. It's how CZMQ has worked since the
start, including in the socket option classes, and no-one has ever
flagged that as problematic. The only plausible use case is for tests,
which is circular.

However, changing existing behavior isn't allowed by our C4
development contract, so I was thinking of making this optional via a
build-time option in libzmq.

-Pieter

On Tue, Jun 17, 2014 at 1:43 AM, Thomas Rodgers <rodgert at twrodgers.com> wrote:
> If so, debug/checked builds only please.
>
>
>
> On Mon, Jun 16, 2014 at 5:11 PM, Pieter Hintjens <ph at imatix.com> wrote:
>>
>> Indeed... I've had this hit more than once, a zmq_setsockopt that
>> caused an error that wasn't being handled, with weird and expensive
>> results down the line.
>>
>> Kind of makes you appreciate assertions more. CZMQ does this -- if a
>> setsockopt fails for any reason except ETERM, it asserts. I might
>> propose such a patch to libzmq.
>>
>> On Mon, Jun 16, 2014 at 8:45 PM, Gerry Steele <gerry.steele at gmail.com>
>> wrote:
>> > Thanks, there was also an error in my error handling thus why it was
>> > never
>> > flagged. I imagine its the same in my app code. uint64_t came from the
>> > cli
>> > argument handling lib thus why it was used over int. A lesson learned
>> > there.
>> >
>> >
>> >
>> >
>> > On 16 June 2014 19:13, Pieter Hintjens <ph at imatix.com> wrote:
>> >>
>> >> And indeed, this code prints "-1" as the return code:
>> >>
>> >>     void *context = zmq_ctx_new ();
>> >>     void *publisher = zmq_socket (context, ZMQ_PUB);
>> >>     uint64_t rhwm = 0;
>> >>     int rc = zmq_setsockopt (publisher, ZMQ_SNDHWM, &rhwm, sizeof
>> >> (rhwm));
>> >>     printf ("RC=%d\n", rc);
>> >>
>> >> -Pieter
>> >>
>> >> On Mon, Jun 16, 2014 at 8:03 PM, Pieter Hintjens <ph at imatix.com> wrote:
>> >> > Hmm, it does check the size of the passed argument, and if that's
>> >> > wrong, returns an error (which you do check for).
>> >> >
>> >> > On Mon, Jun 16, 2014 at 7:36 PM, Gerry Steele
>> >> > <gerry.steele at gmail.com>
>> >> > wrote:
>> >> >> Hi Pieter, you have struck on something there.
>> >> >>
>> >> >> Converting it to int seems to yield the correct behaviour.
>> >> >>
>> >> >> I guess the way setsockopt works type coercion doesn't happen.
>> >> >>
>> >> >> Embarrassing! But at least we got to the bottom of it.
>> >> >>
>> >> >> I was able to send billions of events without incurring loss.
>> >> >> Apologies
>> >> >> for
>> >> >> taking everyones time.
>> >> >>
>> >> >> Thanks all.
>> >> >>
>> >> >> g
>> >> >>
>> >> >>
>> >> >>
>> >> >> On 16 June 2014 18:22, Pieter Hintjens <ph at imatix.com> wrote:
>> >> >>>
>> >> >>> OK, just to double check, you're using ZeroMQ 4.0.x? In your test
>> >> >>> case
>> >> >>> (which I'm belatedly looking at), you use a uint64_t for the hwm
>> >> >>> values; it should be int. Probably not significant.
>> >> >>>
>> >> >>> On Mon, Jun 16, 2014 at 6:20 PM, Gerry Steele
>> >> >>> <gerry.steele at gmail.com>
>> >> >>> wrote:
>> >> >>> > In the patent email I have links to the minimal examples on
>> >> >>> > gist.github.com
>> >> >>> >
>> >> >>> > Happy to open an issue and commit them later on if that's what
>> >> >>> > you
>> >> >>> > need.
>> >> >>> >
>> >> >>> > Thanks
>> >> >>> >
>> >> >>> > On 16 Jun 2014 14:43, "Pieter Hintjens" <ph at imatix.com> wrote:
>> >> >>> >>
>> >> >>> >> Gerry, can you provide a minimal test case that shows the
>> >> >>> >> behavior?
>> >> >>> >> Thanks.
>> >> >>> >>
>> >> >>> >> On Mon, Jun 16, 2014 at 12:49 PM, Gerry Steele
>> >> >>> >> <gerry.steele at gmail.com>
>> >> >>> >> wrote:
>> >> >>> >> > Thanks Peter. I can't try this out till I get home but it is
>> >> >>> >> > looking
>> >> >>> >> > like
>> >> >>> >> > hwm overflows.
>> >> >>> >> >
>> >> >>> >> > If you run the utilities you notice the drops start happening
>> >> >>> >> > after
>> >> >>> >> > precisely 1000 events in the first instance (which Is the
>> >> >>> >> > default
>> >> >>> >> > hwm).
>> >> >>> >> >
>> >> >>> >> > There was another largely ignored thread about this recently
>> >> >>> >> > mentioning
>> >> >>> >> > the
>> >> >>> >> > same problem.
>> >> >>> >> >
>> >> >>> >> > I also tried setting the hwm values to a number greater than
>> >> >>> >> > the
>> >> >>> >> > number
>> >> >>> >> > of
>> >> >>> >> > events and it seemed to have no effect either.
>> >> >>> >> >
>> >> >>> >> > g
>> >> >>> >> >
>> >> >>> >> > On 16 Jun 2014 09:32, "Pieter Hintjens" <ph at imatix.com> wrote:
>> >> >>> >> >>
>> >> >>> >> >> On Mon, Jun 16, 2014 at 9:10 AM, Gerry Steele
>> >> >>> >> >> <gerry.steele at gmail.com>
>> >> >>> >> >> wrote:
>> >> >>> >> >>
>> >> >>> >> >> > Big chunks of messages go missing mid flow and then pick up
>> >> >>> >> >> > again.
>> >> >>> >> >> > There
>> >> >>> >> >> > is
>> >> >>> >> >> > no literature that indicates that is expected behaviour.
>> >> >>> >> >>
>> >> >>> >> >> Right. The two plausible causes for this are (a) HWM
>> >> >>> >> >> overflows,
>> >> >>> >> >> and
>> >> >>> >> >> (b) temporary network disconnects. You have excluded (a),
>> >> >>> >> >> though
>> >> >>> >> >> to
>> >> >>> >> >> be
>> >> >>> >> >> paranoid I'd probably add some temporary logging to libzmq's
>> >> >>> >> >> pub
>> >> >>> >> >> socket to shout out if/when it does hit the HWM. To detect
>> >> >>> >> >> (b)
>> >> >>> >> >> you
>> >> >>> >> >> could use the socket monitoring.  The third possibility is
>> >> >>> >> >> that
>> >> >>> >> >> you're
>> >> >>> >> >> doing something wrong with subscriptions... though that seems
>> >> >>> >> >> unlikely.
>> >> >>> >> >>
>> >> >>> >> >> -Pieter
>> >> >>> >> >> _______________________________________________
>> >> >>> >> >> zeromq-dev mailing list
>> >> >>> >> >> zeromq-dev at lists.zeromq.org
>> >> >>> >> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > _______________________________________________
>> >> >>> >> > zeromq-dev mailing list
>> >> >>> >> > zeromq-dev at lists.zeromq.org
>> >> >>> >> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >> >>> >> >
>> >> >>> >> _______________________________________________
>> >> >>> >> zeromq-dev mailing list
>> >> >>> >> zeromq-dev at lists.zeromq.org
>> >> >>> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >> >>> >
>> >> >>> >
>> >> >>> > _______________________________________________
>> >> >>> > zeromq-dev mailing list
>> >> >>> > zeromq-dev at lists.zeromq.org
>> >> >>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >> >>> >
>> >> >>> _______________________________________________
>> >> >>> zeromq-dev mailing list
>> >> >>> zeromq-dev at lists.zeromq.org
>> >> >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Gerry Steele
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> zeromq-dev mailing list
>> >> >> zeromq-dev at lists.zeromq.org
>> >> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >> >>
>> >> _______________________________________________
>> >> zeromq-dev mailing list
>> >> zeromq-dev at lists.zeromq.org
>> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >
>> >
>> >
>> >
>> > --
>> > Gerry Steele
>> >
>> >
>> > _______________________________________________
>> > zeromq-dev mailing list
>> > zeromq-dev at lists.zeromq.org
>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>



More information about the zeromq-dev mailing list