[zeromq-dev] ZMQ reconnect/ephemeral ports

Luca Boccassi luca.boccassi at gmail.com
Sun Sep 17 12:39:15 CEST 2017


On Sat, 2017-09-16 at 14:34 -0400, Bill Torpey wrote:
> Hi Luca:
> 
> Just a gentle reminder to add an issue so this can be tracked (or let
> me know if you’d prefer that I do that).
> 
> Thanks!
> 
> Bill

Thinking about this a bit more, I think it's expected behaviour after
all. From the doc:

"The 'ZMQ_RECONNECT_IVL' option shall set the initial reconnection
interval for the specified 'socket'.  The reconnection interval is the
period 0MQ shall wait between attempts to reconnect disconnected peers
when using connection-oriented transports. The value -1 means no
reconnection."

So it is working as intended - if a peer goes away, it will never be
reconnected if that option is set.

And it makes sense - in the context of a TCP connection, a dead peer is
a dead peer. If for an application a dead peer might be resurrected
after X amount of time, there's no way to know that. It needs to be
handled by the application.

There are various tools you can use:

1) ZMTP heartbeats - see ZMQ_HEARTBEAT* socket options
2) socket monitoring events (including connects and disconnects) - see
zmq_socket_monitor documentation
3) Enhance your protocol - call zmq_disconnect(endpoint) on your
sockets when a particular message is received, or heartbeats are
missed, or a disconnect event happens. This way when you later call
zmq_connect(endpoint) and it happens to match a previous, dead peer, it
will work as expected

> > On Sep 2, 2017, at 1:21 PM, Luca Boccassi <luca.boccassi at gmail.com>
> > wrote:
> > 
> > On Sat, 2017-09-02 at 12:02 -0400, Bill Torpey wrote:
> > > Thanks again, Luca!
> > > 
> > > For now, I’m going to go with disabling reconnect on the “data”
> > > sockets — that seems to be the best solution for my use case
> > > (connecting to endpoints that were returned by the peer binding
> > > to an
> > > unspecified (“wildcard”) port — e.g., "tcp://<interface>:*" in
> > > ZMQ).
> > > 
> > > This assumes that ZMQ will completely forget about the endpoint
> > > if/when it is disconnected, if it is set not to
> > > reconnect.  Otherwise
> > > I might run afoul of ZMQ’s silently ignoring connections to
> > > endpoints
> > > that it already knows about:  https://github.com/zeromq/libzmq/is
> > > sues
> > > /788 <https://github.com/zeromq/libzmq/issues/788> (e.g., in the
> > > case
> > > where another process later happens to be assigned the same
> > > ephemeral
> > > port).
> > > 
> > > I’ve done a quick scan of the libzmq code (v4.2.2) and it doesn’t
> > > appear that the endpoint is removed in the case of a (terminal)
> > > disconnect.  If you can confirm/deny this behavior, that would be
> > > helpful.  Failing that, I guess I’ll need to test this in the
> > > debugger — any hints on how best to do this would also be much
> > > appreciated.
> > > 
> > > Regards,
> > > 
> > > Bill
> > 
> > Yes it doesn't look like it removes the endpoint - I guess it's a
> > corner case that's missed. I'll open an issue.
> > 
> > BTW all these things are very quick and easy to try with Python on
> > Linux. Just install pyzmq, open a python3 terminal and:
> > 
> > import zmq
> > ctx = zmq.Context.instance()
> > rep = ctx.socket(zmq.REP)
> > rep.bind("tcp://127.0.0.1:12345")
> > req = ctx.socket(zmq.REQ)
> > req.connect("tcp://127.0.0.1:12345")
> > req.send_string("hello")
> > rep.recv()
> > rep.send_string("hallo")
> > req.recv()
> > rep.unbind("tcp://127.0.0.1:12345")
> > rep.close()
> > rep = ctx.socket(zmq.REP)
> > rep.bind("tcp://127.0.0.1:12345")
> > req.send_string("hello")
> > rep.recv()
> > rep.send_string("hallo")
> > req.recv()
> > rep.unbind("tcp://127.0.0.1:12345")
> > rep.close()
> > req.close()
> > rep = ctx.socket(zmq.REP)
> > rep.bind("tcp://127.0.0.1:12345")
> > req = ctx.socket(zmq.REQ)
> > req.setsockopt(zmq.RECONNECT_IVL,
> > -1)req.connect("tcp://127.0.0.1:12345")
> > req.send_string("hello")
> > rep.recv()
> > rep.send_string("hallo")
> > req.recv()
> > rep.unbind("tcp://127.0.0.1:12345")
> > rep.close()
> > rep = ctx.socket(zmq.REP)
> > rep.bind("tcp://127.0.0.1:12345")
> > req.send_string("hello")
> > rep.recv()
> > 
> > This last one won't receive the message
> > 
> > > > On Sep 1, 2017, at 6:19 PM, Luca Boccassi <luca.boccassi at gmail.
> > > > com>
> > > > wrote:
> > > > 
> > > > On Fri, 2017-09-01 at 18:03 -0400, Bill Torpey wrote:
> > > > > Thanks Luca!  That was very helpful.
> > > > > 
> > > > > Although it leads to a couple of other questions:
> > > > > 
> > > > > - Can I assume that a ZMQ disconnect of a tcp endpoint would
> > > > > only
> > > > > occur if the underlying TCP socket is closed by the OS? Or
> > > > > are
> > > > > there
> > > > > conditions in which ZMQ will proactively disconnect the TCP
> > > > > socket
> > > > > and try to reconnect?
> > > > 
> > > > Normally that's the case - you can set up heartbeating with the
> > > > appropriate options and that will kill a connection if there's
> > > > no
> > > > answer
> > > > 
> > > > > - I see that there is a sockopt (ZMQ_RECONNECT_IVL) that can
> > > > > be
> > > > > set
> > > > > to -1 to disable reconnection entirely.  In my case, the the
> > > > > “data”
> > > > > socket pair will *always* connect to an ephemeral port, so I
> > > > > *never*
> > > > > want to reconnect.  Would this be a reasonable option in my
> > > > > case,
> > > > > do
> > > > > you think?
> > > > 
> > > > If that makes sense for your application, go for it - in these
> > > > cases
> > > > the only way to be sure is to test it and see how it works
> > > > 
> > > > > - Would there be any interest in a patch that would disable
> > > > > reconnects (controlled by sockopt) for ephemeral ports
> > > > > only?  I’m
> > > > > guessing that reconnecting mostly makes sense with well-known
> > > > > ports,
> > > > > so something like this may be of general interest?
> > > > 
> > > > If by ephemeral port you mean anything over 1024, then actually
> > > > in
> > > > most
> > > > applications I've seen it's always useful to reconnect, and the
> > > > existing option should be enough for those cases where it's not
> > > > desired
> > > > - we don't want to duplicate functionality
> > > > 
> > > > > Thanks again!
> > > > > 
> > > > > Bill 
> > > > > 
> > > > > > On Sep 1, 2017, at 5:30 PM, Luca Boccassi <luca.boccassi at gm
> > > > > > ail.
> > > > > > com>
> > > > > > wrote:
> > > > > > 
> > > > > > On Fri, 2017-09-01 at 16:59 -0400, Bill Torpey wrote:
> > > > > > > I'm curious about how ZMQ handles re-connection.  I
> > > > > > > understand
> > > > > > > that
> > > > > > > re-connection is supposed to happen "automagically" under
> > > > > > > the
> > > > > > > covers,
> > > > > > > but that poses an interesting question.
> > > > > > > 
> > > > > > > To make a long story short, the application I'm working
> > > > > > > on
> > > > > > > uses
> > > > > > > pub/sub sockets over TCP. and works like follows:
> > > > > > > 
> > > > > > > At startup:
> > > > > > > 1.  connects to a proxy/broker at a well-known address,
> > > > > > > using
> > > > > > > a
> > > > > > > pub/sub socket pair ("discovery");
> > > > > > > 2.  subscribes to a well-known topic using the
> > > > > > > "discovery"
> > > > > > > sub
> > > > > > > socket;
> > > > > > > 3.  binds a different pub/sub socket pair ("data") and
> > > > > > > retrieves
> > > > > > > the
> > > > > > > actual endpoints assigned;
> > > > > > > 4.  publishes the "data" endpoints from step 3 on the
> > > > > > > "discovery"
> > > > > > > pub
> > > > > > > socket; 
> > > > > > > 
> > > > > > > When the application receives a message on the
> > > > > > > "discovery"
> > > > > > > sub
> > > > > > > socket, it connects the "data" socket pair to the
> > > > > > > endpoints
> > > > > > > specified
> > > > > > > in the "discovery" message.
> > > > > > > 
> > > > > > > So far, this seems to be working relatively well, and
> > > > > > > allows
> > > > > > > the
> > > > > > > high-volume, low-latency "data" messages to be
> > > > > > > sent/received
> > > > > > > directly
> > > > > > > between peers, avoiding the extra hop caused by a
> > > > > > > proxy/broker
> > > > > > > connection.  The discovery messages use the proxy/broker,
> > > > > > > but
> > > > > > > since
> > > > > > > these are (very) low-volume the extra hop doesn't
> > > > > > > matter.  The
> > > > > > > use of
> > > > > > > the proxy also eliminates the "slow joiner" problem that
> > > > > > > can
> > > > > > > happen
> > > > > > > with other configurations.
> > > > > > > 
> > > > > > > My question is what happens when one of the "data" peer
> > > > > > > sockets
> > > > > > > disconnects.  Since ZMQ (apparently) keeps trying to
> > > > > > > reconnect,
> > > > > > > what
> > > > > > > would prevent another process from binding to the same
> > > > > > > ephemeral
> > > > > > > port?  
> > > > > > > 
> > > > > > > - Can I assume that if the new application at that port
> > > > > > > is
> > > > > > > not a
> > > > > > > ZMQ
> > > > > > > application, that the reconnect will (silently) fail, and
> > > > > > > continue to
> > > > > > > be retried?
> > > > > > 
> > > > > > The ZMTP handshake would fail, so yes.
> > > > > > 
> > > > > > > - What if the new application at that port *IS* a ZMQ
> > > > > > > application?  Would the reconnect succeed?  And if so,
> > > > > > > what
> > > > > > > would
> > > > > > > happen if it's a *DIFFERENT* ZMQ application, and the
> > > > > > > messages
> > > > > > > that
> > > > > > > it's sending/receiving don't match what the original
> > > > > > > application
> > > > > > > expects?
> > > > > > 
> > > > > > Depends on how you handle it in your application. If you
> > > > > > have
> > > > > > security
> > > > > > concerns, then use CURVE with authentication so that only
> > > > > > authorised
> > > > > > peers can connect.
> > > > > > 
> > > > > > > It's reasonable for the application to publish a
> > > > > > > disconnect
> > > > > > > message
> > > > > > > when it terminates normally, and the connected peers can
> > > > > > > disconnect
> > > > > > > that endpoint.  But, applications don't always terminate
> > > > > > > normally
> > > > > > > ;-)
> > > > > > 
> > > > > > That's a common pattern. But the application needs to
> > > > > > handle
> > > > > > unexpected
> > > > > > data somewhat gracefully. What that means is entirely up to
> > > > > > the
> > > > > > application - as far as the library is concerned, if the
> > > > > > handshake
> > > > > > succeeds then it's all good (hence the use case for CURVE).
> > > > > > 
> > > > > > > Any guidance, hints or tips would be much appreciated --
> > > > > > > thanks
> > > > > > > in
> > > > > > > advance!
> > > > > > 
> > > > > > -- 
> > > > > > Kind regards,
> > > > > > Luca
> > > > > > Boccassi_______________________________________________
> > > > > > zeromq-dev mailing list
> > > > > > zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq
> > > > > > .org
> > > > > > > <mailto:zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at li
> > > > > > > sts.
> > > > > > 
> > > > > > zeromq.org>>
> > > > > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > > > > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
> > > > > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > > > > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>>
> > > > > 
> > > > > _______________________________________________
> > > > > zeromq-dev mailing list
> > > > > zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.o
> > > > > rg>
> > > > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > > > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
> > > > 
> > > > -- 
> > > > Kind regards,
> > > > Luca Boccassi_______________________________________________
> > > > zeromq-dev mailing list
> > > > zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org
> > > > >
> > > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
> > > 
> > > _______________________________________________
> > > zeromq-dev mailing list
> > > zeromq-dev at lists.zeromq.org
> > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > 
> > -- 
> > Kind regards,
> > Luca Boccassi_______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev at lists.zeromq.org
> > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> 
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20170917/2a6e6c51/attachment.sig>


More information about the zeromq-dev mailing list