[zeromq-dev] PUB/SUB unreliabiliity

Pieter Hintjens ph at imatix.com
Tue Jun 17 00:11:01 CEST 2014


Indeed... I've had this hit more than once, a zmq_setsockopt that
caused an error that wasn't being handled, with weird and expensive
results down the line.

Kind of makes you appreciate assertions more. CZMQ does this -- if a
setsockopt fails for any reason except ETERM, it asserts. I might
propose such a patch to libzmq.

On Mon, Jun 16, 2014 at 8:45 PM, Gerry Steele <gerry.steele at gmail.com> wrote:
> Thanks, there was also an error in my error handling thus why it was never
> flagged. I imagine its the same in my app code. uint64_t came from the cli
> argument handling lib thus why it was used over int. A lesson learned there.
>
>
>
>
> On 16 June 2014 19:13, Pieter Hintjens <ph at imatix.com> wrote:
>>
>> And indeed, this code prints "-1" as the return code:
>>
>>     void *context = zmq_ctx_new ();
>>     void *publisher = zmq_socket (context, ZMQ_PUB);
>>     uint64_t rhwm = 0;
>>     int rc = zmq_setsockopt (publisher, ZMQ_SNDHWM, &rhwm, sizeof (rhwm));
>>     printf ("RC=%d\n", rc);
>>
>> -Pieter
>>
>> On Mon, Jun 16, 2014 at 8:03 PM, Pieter Hintjens <ph at imatix.com> wrote:
>> > Hmm, it does check the size of the passed argument, and if that's
>> > wrong, returns an error (which you do check for).
>> >
>> > On Mon, Jun 16, 2014 at 7:36 PM, Gerry Steele <gerry.steele at gmail.com>
>> > wrote:
>> >> Hi Pieter, you have struck on something there.
>> >>
>> >> Converting it to int seems to yield the correct behaviour.
>> >>
>> >> I guess the way setsockopt works type coercion doesn't happen.
>> >>
>> >> Embarrassing! But at least we got to the bottom of it.
>> >>
>> >> I was able to send billions of events without incurring loss. Apologies
>> >> for
>> >> taking everyones time.
>> >>
>> >> Thanks all.
>> >>
>> >> g
>> >>
>> >>
>> >>
>> >> On 16 June 2014 18:22, Pieter Hintjens <ph at imatix.com> wrote:
>> >>>
>> >>> OK, just to double check, you're using ZeroMQ 4.0.x? In your test case
>> >>> (which I'm belatedly looking at), you use a uint64_t for the hwm
>> >>> values; it should be int. Probably not significant.
>> >>>
>> >>> On Mon, Jun 16, 2014 at 6:20 PM, Gerry Steele <gerry.steele at gmail.com>
>> >>> wrote:
>> >>> > In the patent email I have links to the minimal examples on
>> >>> > gist.github.com
>> >>> >
>> >>> > Happy to open an issue and commit them later on if that's what you
>> >>> > need.
>> >>> >
>> >>> > Thanks
>> >>> >
>> >>> > On 16 Jun 2014 14:43, "Pieter Hintjens" <ph at imatix.com> wrote:
>> >>> >>
>> >>> >> Gerry, can you provide a minimal test case that shows the behavior?
>> >>> >> Thanks.
>> >>> >>
>> >>> >> On Mon, Jun 16, 2014 at 12:49 PM, Gerry Steele
>> >>> >> <gerry.steele at gmail.com>
>> >>> >> wrote:
>> >>> >> > Thanks Peter. I can't try this out till I get home but it is
>> >>> >> > looking
>> >>> >> > like
>> >>> >> > hwm overflows.
>> >>> >> >
>> >>> >> > If you run the utilities you notice the drops start happening
>> >>> >> > after
>> >>> >> > precisely 1000 events in the first instance (which Is the default
>> >>> >> > hwm).
>> >>> >> >
>> >>> >> > There was another largely ignored thread about this recently
>> >>> >> > mentioning
>> >>> >> > the
>> >>> >> > same problem.
>> >>> >> >
>> >>> >> > I also tried setting the hwm values to a number greater than the
>> >>> >> > number
>> >>> >> > of
>> >>> >> > events and it seemed to have no effect either.
>> >>> >> >
>> >>> >> > g
>> >>> >> >
>> >>> >> > On 16 Jun 2014 09:32, "Pieter Hintjens" <ph at imatix.com> wrote:
>> >>> >> >>
>> >>> >> >> On Mon, Jun 16, 2014 at 9:10 AM, Gerry Steele
>> >>> >> >> <gerry.steele at gmail.com>
>> >>> >> >> wrote:
>> >>> >> >>
>> >>> >> >> > Big chunks of messages go missing mid flow and then pick up
>> >>> >> >> > again.
>> >>> >> >> > There
>> >>> >> >> > is
>> >>> >> >> > no literature that indicates that is expected behaviour.
>> >>> >> >>
>> >>> >> >> Right. The two plausible causes for this are (a) HWM overflows,
>> >>> >> >> and
>> >>> >> >> (b) temporary network disconnects. You have excluded (a), though
>> >>> >> >> to
>> >>> >> >> be
>> >>> >> >> paranoid I'd probably add some temporary logging to libzmq's pub
>> >>> >> >> socket to shout out if/when it does hit the HWM. To detect (b)
>> >>> >> >> you
>> >>> >> >> could use the socket monitoring.  The third possibility is that
>> >>> >> >> you're
>> >>> >> >> doing something wrong with subscriptions... though that seems
>> >>> >> >> unlikely.
>> >>> >> >>
>> >>> >> >> -Pieter
>> >>> >> >> _______________________________________________
>> >>> >> >> zeromq-dev mailing list
>> >>> >> >> zeromq-dev at lists.zeromq.org
>> >>> >> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >>> >> >
>> >>> >> >
>> >>> >> > _______________________________________________
>> >>> >> > zeromq-dev mailing list
>> >>> >> > zeromq-dev at lists.zeromq.org
>> >>> >> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >>> >> >
>> >>> >> _______________________________________________
>> >>> >> zeromq-dev mailing list
>> >>> >> zeromq-dev at lists.zeromq.org
>> >>> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >>> >
>> >>> >
>> >>> > _______________________________________________
>> >>> > zeromq-dev mailing list
>> >>> > zeromq-dev at lists.zeromq.org
>> >>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >>> >
>> >>> _______________________________________________
>> >>> zeromq-dev mailing list
>> >>> zeromq-dev at lists.zeromq.org
>> >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Gerry Steele
>> >>
>> >>
>> >> _______________________________________________
>> >> zeromq-dev mailing list
>> >> zeromq-dev at lists.zeromq.org
>> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
>
> --
> Gerry Steele
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>



More information about the zeromq-dev mailing list