[zeromq-dev] Is inproc PUB SUB subject to the slow joiner problem?

Bill Torpey wallstprog at gmail.com
Mon May 13 22:23:23 CEST 2019


Hi Chris:

Thanks for that — you learn something new every day!

I certainly can’t explain the results you’re seeing — my understanding has been that (a) subscription information is exchanged at the point where SUB connects to PUB, and (b) that connect’s are synchronous with inproc sockets.  That is based on *lots* of testing, as well as discussions with some of the project maintainers.

I would encourage you to submit this as an issue in GitHub so it can get the proper attention.  If nothing else, that may trigger an update to the docs to make things more clear.

I suspect that your “trick” of calling recv on the XPUB is related to this well-known work-around:  https://gist.github.com/hintjens/7344533 <https://gist.github.com/hintjens/7344533>.   Unfortunately, the workaround is itself unreliable: https://github.com/zeromq/libzmq/issues/2267#issuecomment-455343688 <https://github.com/zeromq/libzmq/issues/2267#issuecomment-455343688>.

Last but not least, I would give a word of warning:  just because some piece of code appears to work is no guarantee that it will work reliably.  I’ve run tests with ZeroMQ where the test only failed after thousands of iterations over several days — the asynchronous nature of messaging in general, and ZeroMQ in particular, mean that “Heisenbugs” can be VERY intermittent.  Before you rely on a particular behavior, it is best to make sure that (a) the docs explicitly state that it will work, and/or (b) that you have tested it thoroughly under stress/load — ideally both.

Good luck!

Bill


> On May 10, 2019, at 7:26 PM, Chris Billington <chrisjbillington at gmail.com> wrote:
> 
> Hi Bill,
> 
> Thanks for the insight.
> 
> It is both messages that get dropped, though I at least expected the first one would be. 
> 
> I've tried moving things around various ways, but even subscribing before connecting has the issue. The following code has the subscriber receiving no messages:
> 
> 
> import zmq
> 
> ctx = zmq.Context()
> pub = ctx.socket(zmq.PUB)
> sub = ctx.socket(zmq.SUB)
> 
> pub.bind('inproc://test')
> pub.send(b'hello')
> 
> sub.subscribe(b'')
> sub.connect('inproc://test')
> 
> pub.send(b'hello')
> print(sub.recv())
> 
> But if I remove the first pub.send(b'hello'), then the subscriber does receive the subsequent message.
> 
> And the following seems to work, swapping for an xpub and explicitly calling recv(), seemingly ensuring subscriptions have been processed:
> 
> import zmq
> 
> ctx = zmq.Context()
> xpub = ctx.socket(zmq.XPUB)
> sub = ctx.socket(zmq.SUB)
> 
> xpub.bind('inproc://test')
> xpub.send(b'hello')
> 
> sub.subscribe(b'')
> sub.connect('inproc://test')
> xpub.recv()
> 
> xpub.send(b'hello')
> print(sub.recv())
> 
> 
> This presence of a send() before any subscribers have connected seems to be what is making the difference. Without it, I don't see dropped initial messages after subscribers have connected, but with it, I do. And my application has multiple subscribers, so of course messages have been sent before some of the subscribers have connected. But for my application I need to be able to guarantee that all messages past a certain point are not dropped. The XPUB recv() trick seems to do it, and since I'm in a single thread so far nothing more needs to be done (In other, distributed code I then have the XPUB socket send a message to acknowledge the subscription).
> 
> So I think I have my answer: subscriptions are not synchronous for inproc, I just maybe hit on a situation initially where it looked like they were.
> 
> -Chris
> 
> 
> 
> 
> 
> On Fri., 10 May 2019, 18:36 Bill Torpey, <wallstprog at gmail.com <mailto:wallstprog at gmail.com>> wrote:
> Hi Chris:
> 
> One more thing — I took a closer look at your code, and I noticed that you're subscribing the SUB socket AFTER connecting.  This will cause all sorts of “late joiner” problems,  It is recommended (and I don’t have the reference handy) to subscribe BEFORE connecting — that way, the subscription information gets exchanged at connect time, and since the inproc connect is synchronous, you should be good to go.  (Keeping in mind that for most protocols — anything other than PGM — filtering is done at the publisher at the point where zmq_send is called).
> 
> Try putting the subscribe before the connect and see if that helps.  I’ll see if I can find an “official” reference to this.
> 
> Regards,
> 
> Bill
> 
>> On May 10, 2019, at 5:19 PM, Bill Torpey <wallstprog at gmail.com <mailto:wallstprog at gmail.com>> wrote:
>> 
>> Hi Chris:
>> 
>>> The difference seems to be that if the PUB socket sends a message prior to any SUB sockets connecting, then later subscriptions are subject to the slow joiner problem. 
>> 
>> If I understand you correctly, you’re saying that the SUB doesn’t receive the first message — but that’s to be expected, since it was sent before the sub connected.    I believe the SUB socket should receive the second message, however.
>> 
>> Is that what you’re seeing?
>> 
>> Bill
>> 
>>> On May 10, 2019, at 5:11 PM, Chris Billington <chrisjbillington at gmail.com <mailto:chrisjbillington at gmail.com>> wrote:
>>> 
>>> Hi BIll,
>>> 
>>> Thanks for the response.
>>> 
>>> Connect may be synchronous, but after more experimentation, it looks like the processing of subscriptions is not, or at least not in some circumstances. The following blocks on recv():
>>> 
>>> import zmq
>>> 
>>> ctx = zmq.Context()
>>> pub = ctx.socket(zmq.PUB)
>>> sub = ctx.socket(zmq.SUB)
>>> 
>>> pub.bind('inproc://test <>')
>>> pub.send(b'hello')
>>> sub.connect('inproc://test <>')
>>> sub.subscribe(b'')
>>> pub.send(b'hello')
>>> print(sub.recv())
>>> 
>>> The difference seems to be that if the PUB socket sends a message prior to any SUB sockets connecting, then later subscriptions are subject to the slow joiner problem. For me since the two sockets are initially in the same thread, I can get around this by using an XPUB instead and calling recv() on it after the subscription was sent. This seems to ensure that the subscription has been processed. But if I have misunderstood anything, please let me know :)
>>> 
>>> -Chris
>>> 
>>> On Fri, May 10, 2019 at 2:05 PM Bill Torpey <wallstprog at gmail.com <mailto:wallstprog at gmail.com>> wrote:
>>> Hi Chris:
>>> 
>>> With inproc transports the connect call is synchronous, as opposed to with other protocols (like TCP) where the connect is asynchronous.  This was part of a discussion with Simon at https://github.com/zeromq/libzmq/issues/2759#issuecomment-389185969 <https://github.com/zeromq/libzmq/issues/2759#issuecomment-389185969> , but I have still not found this described elsewhere in the “official” docs.  (There is another reference here: https://grokbase.com/t/zeromq/zeromq-dev/1343mv38cr/inproc%EF%BC%9A-message-dropped-after-zmq-dealer-connected <https://grokbase.com/t/zeromq/zeromq-dev/1343mv38cr/inproc%EF%BC%9A-message-dropped-after-zmq-dealer-connected> )
>>> 
>>> Note also that the disconnect is NOT synchronous, which can lead to problems if you disconnect and then immediately try to connect again — if the socket has not finished disconnecting, the second connect will fail.
>>> 
>>> Regards,
>>> 
>>> Bill
>>> 
>>>> On May 10, 2019, at 1:46 PM, Chris Billington <chrisjbillington at gmail.com <mailto:chrisjbillington at gmail.com>> wrote:
>>>> 
>>>> The below pyzmq code sends a message on a PUB socket to a SUB socket via inproc, without doing any kind of welcome messages or anything to get around the slow joiner problem, and does not appear to drop messages. However if I change the endpoint to a TCP one, then it is subject to the slow joiner problem and the subscriber doesn't receive the initial message, as expected.
>>>> 
>>>> import zmq
>>>> 
>>>> ctx = zmq.Context()
>>>> pub = ctx.socket(zmq.PUB)
>>>> sub = ctx.socket(zmq.SUB)
>>>> 
>>>> pub.bind('inproc://test <>')
>>>> sub.subscribe(b'')
>>>> sub.connect('inproc://test <>')
>>>> pub.send(b'hello')
>>>> print(sub.recv())
>>>> 
>>>> 
>>>> Is inproc guaranteed to not be subject to the slow joiner problem? Or am I just getting lucky with not seeing messages dropped in my test? Since inproc does not use separate IO threads, it stands to reason that slow joining might not be an issue. If so, this would be great as it would allow me to use simpler code for inproc PUB SUB.
>>>> 
>>>> Regards,
>>>> 
>>>> Chris
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>>> 
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>> 
> 
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20190513/23f6c1ba/attachment.htm>


More information about the zeromq-dev mailing list