[zeromq-dev] PUB/SUB missed initial message even using monitor events (possibly just #2267, but not sure)

Jason Heeris jason.heeris at gmail.com
Wed Feb 8 02:54:24 CET 2023


On Wed, 8 Feb 2023 at 05:41, Bill Torpey <wallstprog at gmail.com> wrote:
> Btw, I’m assuming connection-oriented (e.g., TCP) transport here.  Semantics
> could be very different w/other mechanisms.

No, IPC (ie. Unix sockets in the abstract namespace eg. "ipc://@zmq-test", see
eg. subscribe.rs:6). I actually didn't realise the semantics were so different
between IPC and TCP, so apologies for not mentioning it in the first place.

> On Feb 6, 2023, at 8:10 PM, Jason Heeris <jason.heeris at gmail.com> wrote:
> So, the initial connect should timeout — correct?

At the level my code is using the API, the connect call returns immediately and
the connection is eventually done asynchronously. I assume this is because I set
the connect timeout to 0 and then as you say, zmq handlies it asynchronously.

Based on the details you explained, it does actually sound like this is simply
#2267, partly obscured by leaning heavily on the asynchronous behaviour around
connect/bind.

> It looks like your code calls socket_monitor *after* the bind/connect calls —
> it’s better to start monitoring immediately after the create in order to see
> what is going on with the connect/bind calls.  I’m not a “rustacean” myself,
> but it looks like you’re missing some events given the way you sequence the
> calls to monitor.

Ah yes, you are right. I had doubled-down on looking at the handshake event and
anything after, nothing earlier.

> BTW, I know this doesn’t answer your question as to why this is happening, but
> a very helpful feature in zmq is the “welcome” msg — see here
> (https://web.archive.org/web/20160208000728/http://somdoron.com/2015/09/reliable-pubsub/)
> and here (https://github.com/somdoron/ReliablePubSub).   OZ uses this to know
> for sure when a sub is connected to a pub.  You might also find some of this
> info helpful:
> https://github.com/nyfix/OZ/blob/master/doc/Reconnects-Heartbeats.md.

I have been looking at OZ with great interest actually, it's a good protocol! In
my application I lean more towards the out-of-band snapshot ie. events have a
sequence number, and there's another channel for getting the full initial state,
ensuring there's no gap, etc. This works where the model is "have big app
config, publish tiny updates after validation".

But note that this question is specifically in the context of tests. Although it
seems against the grain, for *some* integration tests I try to use non-ZMQ
out-of-band signalling, because I want to test eg. what happens a long time
after that initial snapshot. So sort-of-reimplementing-but-not-quite a test
version of that would defeat the point. For these specific tests, I've
refactored them so that wherever pub/sub exchanges are involved, the pub is
started and bound first. (Again, this is just for a subset of tests. For others,
yes, a "real" protocol is good and useful.)

Thanks so much for the detail, there are some hard-won insights here I can carry
over to my applications to improve them!

Cheers,
Jason


More information about the zeromq-dev mailing list