[zeromq-dev] ZMQ reconnect/ephemeral ports

Luca Boccassi luca.boccassi at gmail.com
Tue Sep 19 15:13:28 CEST 2017


On Sun, 2017-09-17 at 12:29 -0400, Bill Torpey wrote:
> Luca:
> 
> I hear what you’re saying but … I think I’m talking about a different
> situation.
> 
> If I understand your explanation correctly, you’re saying that
> setting ZMQ_RECONNECT_IVL to -1 should prevent a disconnected
> endpoint from *ever* reconnecting, under any set of circumstances.  
> 
> I would read the doc (4.2.2) more like the following (with addition
> in *bold*):
> 
> > The ZMQ_RECONNECT_IVL option shall set the initial reconnection
> > interval for the specified socket. The reconnection interval is the
> > period ØMQ shall wait between attempts to *automatically* reconnect
> > disconnected peers when using connection-oriented transports. The
> > value -1 means no reconnection.
> 
> 
> What I’m questioning is the interaction between ZMQ_RECONNECT_IVL ==
> -1  and the behavior enforced by https://github.com/zeromq/libzmq/iss
> ues/788. (Also see here: https://www.mail-archive.com/zeromq-
> dev at lists.zeromq.org/msg21484.html).  That commit is intended to
> prevent *duplicate* connections from the same endpoint, for certain
> socket types (e.g., pub/sub), where multiple connections (and their
> associated duplicate messages) don’t make sense.
> 
> One scenario I’m concerned about is the one where:
> 
> 1.	Endpoint connects to us
> 2.	Endpoint is disconnected for some reason
> 3.	Setting ZMQ_RECONNECT_IVL=-1 disables *automatic*
> reconnect, so as far as we’re concerned the endpoint is dead
> 4.	Subsequently the endpoint connects to us again (e.g.,
> following a restart)
> 5.	Because we still have a record of the endpoint, we will
> refuse the connection — even though the endpoint is dead from our
> point of view.  In this scenario that endpoint can NEVER reconnect.
> 
> So I get that setting ZMQ_RECONNECT_IVL should prevent us from
> reconnecting (automatically) to the disconnected endpoint, but I
> don’t see the benefit of preventing that endpoint from actively
> reconnecting at a later time.  In this case, we’ve essentially
> blacklisted that endpoint (forever), and I’m having trouble coming up
> with a scenario where that would be intended behavior.
> 
> Does this make sense?  Am I missing something here?
> 
> Also, to your point about adding a protocol layer on top of 0MQ — I
> would MUCH prefer to let 0MQ handle as much of the underlying
> connect/disconnect logic as possible.  I’m concerned about the
> potential for the protocol’s view of the connection state getting out
> of sync with 0MQ’s view (not to mention a bunch of additional work on
> the protocol layer, but more about synchronization).
> 
> Thanks for listening ...
> 
> Bill

I see. I guess there's a terminology confusion issue here - when I
wrote about connections and disconnections, I meant the automated ones
that happen in the background in the I/O thread. But I guess it makes
sense that a manual call to zmq_connect should work as expected.

A workaround for this behaviour would be for the application to
manually call zmq_disconnect before doing a connect to the same
endpoint.

But it turns out fixing it to automatically do it is not too hard
(unless I've made some silly mistake):

https://github.com/zeromq/libzmq/pull/2756

> > On Sep 17, 2017, at 6:39 AM, Luca Boccassi <luca.boccassi at gmail.com
> > > wrote:
> > 
> > On Sat, 2017-09-16 at 14:34 -0400, Bill Torpey wrote:
> > > Hi Luca:
> > > 
> > > Just a gentle reminder to add an issue so this can be tracked (or
> > > let
> > > me know if you’d prefer that I do that).
> > > 
> > > Thanks!
> > > 
> > > Bill
> > 
> > Thinking about this a bit more, I think it's expected behaviour
> > after
> > all. From the doc:
> > 
> > "The 'ZMQ_RECONNECT_IVL' option shall set the initial reconnection
> > interval for the specified 'socket'.  The reconnection interval is
> > the
> > period 0MQ shall wait between attempts to reconnect disconnected
> > peers
> > when using connection-oriented transports. The value -1 means no
> > reconnection."
> > 
> > So it is working as intended - if a peer goes away, it will never
> > be
> > reconnected if that option is set.
> > 
> > And it makes sense - in the context of a TCP connection, a dead
> > peer is
> > a dead peer. If for an application a dead peer might be resurrected
> > after X amount of time, there's no way to know that. It needs to be
> > handled by the application.
> > 
> > There are various tools you can use:
> > 
> > 1) ZMTP heartbeats - see ZMQ_HEARTBEAT* socket options
> > 2) socket monitoring events (including connects and disconnects) -
> > see
> > zmq_socket_monitor documentation
> > 3) Enhance your protocol - call zmq_disconnect(endpoint) on your
> > sockets when a particular message is received, or heartbeats are
> > missed, or a disconnect event happens. This way when you later call
> > zmq_connect(endpoint) and it happens to match a previous, dead
> > peer, it
> > will work as expected
> > 
> > > > On Sep 2, 2017, at 1:21 PM, Luca Boccassi <luca.boccassi at gmail.
> > > > com>
> > > > wrote:
> > > > 
> > > > On Sat, 2017-09-02 at 12:02 -0400, Bill Torpey wrote:
> > > > > Thanks again, Luca!
> > > > > 
> > > > > For now, I’m going to go with disabling reconnect on the
> > > > > “data”
> > > > > sockets — that seems to be the best solution for my use case
> > > > > (connecting to endpoints that were returned by the peer
> > > > > binding
> > > > > to an
> > > > > unspecified (“wildcard”) port — e.g., "tcp://<interface>:*"
> > > > > in
> > > > > ZMQ).
> > > > > 
> > > > > This assumes that ZMQ will completely forget about the
> > > > > endpoint
> > > > > if/when it is disconnected, if it is set not to
> > > > > reconnect.  Otherwise
> > > > > I might run afoul of ZMQ’s silently ignoring connections to
> > > > > endpoints
> > > > > that it already knows about:  https://github.com/zeromq/libzm
> > > > > q/is
> > > > > sues
> > > > > /788 <https://github.com/zeromq/libzmq/issues/788> (e.g., in
> > > > > the
> > > > > case
> > > > > where another process later happens to be assigned the same
> > > > > ephemeral
> > > > > port).
> > > > > 
> > > > > I’ve done a quick scan of the libzmq code (v4.2.2) and it
> > > > > doesn’t
> > > > > appear that the endpoint is removed in the case of a
> > > > > (terminal)
> > > > > disconnect.  If you can confirm/deny this behavior, that
> > > > > would be
> > > > > helpful.  Failing that, I guess I’ll need to test this in the
> > > > > debugger — any hints on how best to do this would also be
> > > > > much
> > > > > appreciated.
> > > > > 
> > > > > Regards,
> > > > > 
> > > > > Bill
> > > > 
> > > > Yes it doesn't look like it removes the endpoint - I guess it's
> > > > a
> > > > corner case that's missed. I'll open an issue.
> > > > 
> > > > BTW all these things are very quick and easy to try with Python
> > > > on
> > > > Linux. Just install pyzmq, open a python3 terminal and:
> > > > 
> > > > import zmq
> > > > ctx = zmq.Context.instance()
> > > > rep = ctx.socket(zmq.REP)
> > > > rep.bind("tcp://127.0.0.1:12345")
> > > > req = ctx.socket(zmq.REQ)
> > > > req.connect("tcp://127.0.0.1:12345")
> > > > req.send_string("hello")
> > > > rep.recv()
> > > > rep.send_string("hallo")
> > > > req.recv()
> > > > rep.unbind("tcp://127.0.0.1:12345")
> > > > rep.close()
> > > > rep = ctx.socket(zmq.REP)
> > > > rep.bind("tcp://127.0.0.1:12345")
> > > > req.send_string("hello")
> > > > rep.recv()
> > > > rep.send_string("hallo")
> > > > req.recv()
> > > > rep.unbind("tcp://127.0.0.1:12345")
> > > > rep.close()
> > > > req.close()
> > > > rep = ctx.socket(zmq.REP)
> > > > rep.bind("tcp://127.0.0.1:12345")
> > > > req = ctx.socket(zmq.REQ)
> > > > req.setsockopt(zmq.RECONNECT_IVL,
> > > > -1)req.connect("tcp://127.0.0.1:12345")
> > > > req.send_string("hello")
> > > > rep.recv()
> > > > rep.send_string("hallo")
> > > > req.recv()
> > > > rep.unbind("tcp://127.0.0.1:12345")
> > > > rep.close()
> > > > rep = ctx.socket(zmq.REP)
> > > > rep.bind("tcp://127.0.0.1:12345")
> > > > req.send_string("hello")
> > > > rep.recv()
> > > > 
> > > > This last one won't receive the message
> > > > 
> > > > > > On Sep 1, 2017, at 6:19 PM, Luca Boccassi <luca.boccassi at gm
> > > > > > ail.
> > > > > > com>
> > > > > > wrote:
> > > > > > 
> > > > > > On Fri, 2017-09-01 at 18:03 -0400, Bill Torpey wrote:
> > > > > > > Thanks Luca!  That was very helpful.
> > > > > > > 
> > > > > > > Although it leads to a couple of other questions:
> > > > > > > 
> > > > > > > - Can I assume that a ZMQ disconnect of a tcp endpoint
> > > > > > > would
> > > > > > > only
> > > > > > > occur if the underlying TCP socket is closed by the OS?
> > > > > > > Or
> > > > > > > are
> > > > > > > there
> > > > > > > conditions in which ZMQ will proactively disconnect the
> > > > > > > TCP
> > > > > > > socket
> > > > > > > and try to reconnect?
> > > > > > 
> > > > > > Normally that's the case - you can set up heartbeating with
> > > > > > the
> > > > > > appropriate options and that will kill a connection if
> > > > > > there's
> > > > > > no
> > > > > > answer
> > > > > > 
> > > > > > > - I see that there is a sockopt (ZMQ_RECONNECT_IVL) that
> > > > > > > can
> > > > > > > be
> > > > > > > set
> > > > > > > to -1 to disable reconnection entirely.  In my case, the
> > > > > > > the
> > > > > > > “data”
> > > > > > > socket pair will *always* connect to an ephemeral port,
> > > > > > > so I
> > > > > > > *never*
> > > > > > > want to reconnect.  Would this be a reasonable option in
> > > > > > > my
> > > > > > > case,
> > > > > > > do
> > > > > > > you think?
> > > > > > 
> > > > > > If that makes sense for your application, go for it - in
> > > > > > these
> > > > > > cases
> > > > > > the only way to be sure is to test it and see how it works
> > > > > > 
> > > > > > > - Would there be any interest in a patch that would
> > > > > > > disable
> > > > > > > reconnects (controlled by sockopt) for ephemeral ports
> > > > > > > only?  I’m
> > > > > > > guessing that reconnecting mostly makes sense with well-
> > > > > > > known
> > > > > > > ports,
> > > > > > > so something like this may be of general interest?
> > > > > > 
> > > > > > If by ephemeral port you mean anything over 1024, then
> > > > > > actually
> > > > > > in
> > > > > > most
> > > > > > applications I've seen it's always useful to reconnect, and
> > > > > > the
> > > > > > existing option should be enough for those cases where it's
> > > > > > not
> > > > > > desired
> > > > > > - we don't want to duplicate functionality
> > > > > > 
> > > > > > > Thanks again!
> > > > > > > 
> > > > > > > Bill 
> > > > > > > 
> > > > > > > > On Sep 1, 2017, at 5:30 PM, Luca Boccassi <luca.boccass
> > > > > > > > i at gm
> > > > > > > > ail.
> > > > > > > > com>
> > > > > > > > wrote:
> > > > > > > > 
> > > > > > > > On Fri, 2017-09-01 at 16:59 -0400, Bill Torpey wrote:
> > > > > > > > > I'm curious about how ZMQ handles re-connection.  I
> > > > > > > > > understand
> > > > > > > > > that
> > > > > > > > > re-connection is supposed to happen "automagically"
> > > > > > > > > under
> > > > > > > > > the
> > > > > > > > > covers,
> > > > > > > > > but that poses an interesting question.
> > > > > > > > > 
> > > > > > > > > To make a long story short, the application I'm
> > > > > > > > > working
> > > > > > > > > on
> > > > > > > > > uses
> > > > > > > > > pub/sub sockets over TCP. and works like follows:
> > > > > > > > > 
> > > > > > > > > At startup:
> > > > > > > > > 1.  connects to a proxy/broker at a well-known
> > > > > > > > > address,
> > > > > > > > > using
> > > > > > > > > a
> > > > > > > > > pub/sub socket pair ("discovery");
> > > > > > > > > 2.  subscribes to a well-known topic using the
> > > > > > > > > "discovery"
> > > > > > > > > sub
> > > > > > > > > socket;
> > > > > > > > > 3.  binds a different pub/sub socket pair ("data")
> > > > > > > > > and
> > > > > > > > > retrieves
> > > > > > > > > the
> > > > > > > > > actual endpoints assigned;
> > > > > > > > > 4.  publishes the "data" endpoints from step 3 on the
> > > > > > > > > "discovery"
> > > > > > > > > pub
> > > > > > > > > socket; 
> > > > > > > > > 
> > > > > > > > > When the application receives a message on the
> > > > > > > > > "discovery"
> > > > > > > > > sub
> > > > > > > > > socket, it connects the "data" socket pair to the
> > > > > > > > > endpoints
> > > > > > > > > specified
> > > > > > > > > in the "discovery" message.
> > > > > > > > > 
> > > > > > > > > So far, this seems to be working relatively well, and
> > > > > > > > > allows
> > > > > > > > > the
> > > > > > > > > high-volume, low-latency "data" messages to be
> > > > > > > > > sent/received
> > > > > > > > > directly
> > > > > > > > > between peers, avoiding the extra hop caused by a
> > > > > > > > > proxy/broker
> > > > > > > > > connection.  The discovery messages use the
> > > > > > > > > proxy/broker,
> > > > > > > > > but
> > > > > > > > > since
> > > > > > > > > these are (very) low-volume the extra hop doesn't
> > > > > > > > > matter.  The
> > > > > > > > > use of
> > > > > > > > > the proxy also eliminates the "slow joiner" problem
> > > > > > > > > that
> > > > > > > > > can
> > > > > > > > > happen
> > > > > > > > > with other configurations.
> > > > > > > > > 
> > > > > > > > > My question is what happens when one of the "data"
> > > > > > > > > peer
> > > > > > > > > sockets
> > > > > > > > > disconnects.  Since ZMQ (apparently) keeps trying to
> > > > > > > > > reconnect,
> > > > > > > > > what
> > > > > > > > > would prevent another process from binding to the
> > > > > > > > > same
> > > > > > > > > ephemeral
> > > > > > > > > port?  
> > > > > > > > > 
> > > > > > > > > - Can I assume that if the new application at that
> > > > > > > > > port
> > > > > > > > > is
> > > > > > > > > not a
> > > > > > > > > ZMQ
> > > > > > > > > application, that the reconnect will (silently) fail,
> > > > > > > > > and
> > > > > > > > > continue to
> > > > > > > > > be retried?
> > > > > > > > 
> > > > > > > > The ZMTP handshake would fail, so yes.
> > > > > > > > 
> > > > > > > > > - What if the new application at that port *IS* a ZMQ
> > > > > > > > > application?  Would the reconnect succeed?  And if
> > > > > > > > > so,
> > > > > > > > > what
> > > > > > > > > would
> > > > > > > > > happen if it's a *DIFFERENT* ZMQ application, and the
> > > > > > > > > messages
> > > > > > > > > that
> > > > > > > > > it's sending/receiving don't match what the original
> > > > > > > > > application
> > > > > > > > > expects?
> > > > > > > > 
> > > > > > > > Depends on how you handle it in your application. If
> > > > > > > > you
> > > > > > > > have
> > > > > > > > security
> > > > > > > > concerns, then use CURVE with authentication so that
> > > > > > > > only
> > > > > > > > authorised
> > > > > > > > peers can connect.
> > > > > > > > 
> > > > > > > > > It's reasonable for the application to publish a
> > > > > > > > > disconnect
> > > > > > > > > message
> > > > > > > > > when it terminates normally, and the connected peers
> > > > > > > > > can
> > > > > > > > > disconnect
> > > > > > > > > that endpoint.  But, applications don't always
> > > > > > > > > terminate
> > > > > > > > > normally
> > > > > > > > > ;-)
> > > > > > > > 
> > > > > > > > That's a common pattern. But the application needs to
> > > > > > > > handle
> > > > > > > > unexpected
> > > > > > > > data somewhat gracefully. What that means is entirely
> > > > > > > > up to
> > > > > > > > the
> > > > > > > > application - as far as the library is concerned, if
> > > > > > > > the
> > > > > > > > handshake
> > > > > > > > succeeds then it's all good (hence the use case for
> > > > > > > > CURVE).
> > > > > > > > 
> > > > > > > > > Any guidance, hints or tips would be much appreciated
> > > > > > > > > --
> > > > > > > > > thanks
> > > > > > > > > in
> > > > > > > > > advance!
> > > > > > > > 
> > > > > > > > -- 
> > > > > > > > Kind regards,
> > > > > > > > Luca
> > > > > > > > Boccassi_______________________________________________
> > > > > > > > zeromq-dev mailing list
> > > > > > > > zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.ze
> > > > > > > > romq
> > > > > > > > .org
> > > > > > > > > <mailto:zeromq-dev at lists.zeromq.org <mailto:zeromq-de
> > > > > > > > > v at li
> > > > > > > > > sts.
> > > > > > > > 
> > > > > > > > zeromq.org>>
> > > > > > > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > > > > > > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
> > > > > > > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > > > > > > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>>
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > zeromq-dev mailing list
> > > > > > > zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zero
> > > > > > > mq.o
> > > > > > > rg>
> > > > > > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > > > > > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
> > > > > > 
> > > > > > -- 
> > > > > > Kind regards,
> > > > > > Luca
> > > > > > Boccassi_______________________________________________
> > > > > > zeromq-dev mailing list
> > > > > > zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq
> > > > > > .org
> > > > > > > 
> > > > > > 
> > > > > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > > > > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
> > > > > 
> > > > > _______________________________________________
> > > > > zeromq-dev mailing list
> > > > > zeromq-dev at lists.zeromq.org
> > > > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > > > 
> > > > -- 
> > > > Kind regards,
> > > > Luca Boccassi_______________________________________________
> > > > zeromq-dev mailing list
> > > > zeromq-dev at lists.zeromq.org
> > > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > > 
> > > _______________________________________________
> > > zeromq-dev mailing list
> > > zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
> > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > > <https://lists.zeromq.org/mailman/listinfo/zeromq-
> > > dev>_______________________________________________
> > 
> > zeromq-dev mailing list
> > zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
> > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
> 
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev

-- 
Kind regards,
Luca Boccassi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20170919/232a0e99/attachment.sig>


More information about the zeromq-dev mailing list