[zeromq-dev] ZMQ reconnect/ephemeral ports

Harald Achitz harald.achitz at gmail.com
Sun Oct 1 19:37:44 CEST 2017


Hi Luca

I have added some comments to the pr since I am unsure about the resource
management of the term_endpoint::endpoint heap allocated string.

It would be nice if you could have a look to them, thanks!

Regards,
Harald

2017-09-26 23:23 GMT+02:00 Bill Torpey <wallstprog at gmail.com>:

> Hi Luca:
>
> Sorry for not geting back sooner, but thanks again for listening, and the
> PR looks good to me!
>
> Best Regards,
>
> Bill Torpey
>
> On Sep 19, 2017, at 9:13 AM, Luca Boccassi <luca.boccassi at gmail.com>
> wrote:
>
> On Sun, 2017-09-17 at 12:29 -0400, Bill Torpey wrote:
>
> Luca:
>
> I hear what you’re saying but … I think I’m talking about a different
> situation.
>
> If I understand your explanation correctly, you’re saying that
> setting ZMQ_RECONNECT_IVL to -1 should prevent a disconnected
> endpoint from *ever* reconnecting, under any set of circumstances.
>
> I would read the doc (4.2.2) more like the following (with addition
> in *bold*):
>
> The ZMQ_RECONNECT_IVL option shall set the initial reconnection
> interval for the specified socket. The reconnection interval is the
> period ØMQ shall wait between attempts to *automatically* reconnect
> disconnected peers when using connection-oriented transports. The
> value -1 means no reconnection.
>
>
>
> What I’m questioning is the interaction between ZMQ_RECONNECT_IVL ==
> -1  and the behavior enforced by https://github.com/zeromq/libzmq/iss
> ues/788. (Also see here: https://www.mail-archive.com/zeromq-
> dev at lists.zeromq.org/msg21484.html).  That commit is intended to
> prevent *duplicate* connections from the same endpoint, for certain
> socket types (e.g., pub/sub), where multiple connections (and their
> associated duplicate messages) don’t make sense.
>
> One scenario I’m concerned about is the one where:
>
> 1. Endpoint connects to us
> 2. Endpoint is disconnected for some reason
> 3. Setting ZMQ_RECONNECT_IVL=-1 disables *automatic*
> reconnect, so as far as we’re concerned the endpoint is dead
> 4. Subsequently the endpoint connects to us again (e.g.,
> following a restart)
> 5. Because we still have a record of the endpoint, we will
> refuse the connection — even though the endpoint is dead from our
> point of view.  In this scenario that endpoint can NEVER reconnect.
>
> So I get that setting ZMQ_RECONNECT_IVL should prevent us from
> reconnecting (automatically) to the disconnected endpoint, but I
> don’t see the benefit of preventing that endpoint from actively
> reconnecting at a later time.  In this case, we’ve essentially
> blacklisted that endpoint (forever), and I’m having trouble coming up
> with a scenario where that would be intended behavior.
>
> Does this make sense?  Am I missing something here?
>
> Also, to your point about adding a protocol layer on top of 0MQ — I
> would MUCH prefer to let 0MQ handle as much of the underlying
> connect/disconnect logic as possible.  I’m concerned about the
> potential for the protocol’s view of the connection state getting out
> of sync with 0MQ’s view (not to mention a bunch of additional work on
> the protocol layer, but more about synchronization).
>
> Thanks for listening ...
>
> Bill
>
>
> I see. I guess there's a terminology confusion issue here - when I
> wrote about connections and disconnections, I meant the automated ones
> that happen in the background in the I/O thread. But I guess it makes
> sense that a manual call to zmq_connect should work as expected.
>
> A workaround for this behaviour would be for the application to
> manually call zmq_disconnect before doing a connect to the same
> endpoint.
>
> But it turns out fixing it to automatically do it is not too hard
> (unless I've made some silly mistake):
>
> https://github.com/zeromq/libzmq/pull/2756
>
> On Sep 17, 2017, at 6:39 AM, Luca Boccassi <luca.boccassi at gmail.com
>
> wrote:
>
>
> On Sat, 2017-09-16 at 14:34 -0400, Bill Torpey wrote:
>
> Hi Luca:
>
> Just a gentle reminder to add an issue so this can be tracked (or
> let
> me know if you’d prefer that I do that).
>
> Thanks!
>
> Bill
>
>
> Thinking about this a bit more, I think it's expected behaviour
> after
> all. From the doc:
>
> "The 'ZMQ_RECONNECT_IVL' option shall set the initial reconnection
> interval for the specified 'socket'.  The reconnection interval is
> the
> period 0MQ shall wait between attempts to reconnect disconnected
> peers
> when using connection-oriented transports. The value -1 means no
> reconnection."
>
> So it is working as intended - if a peer goes away, it will never
> be
> reconnected if that option is set.
>
> And it makes sense - in the context of a TCP connection, a dead
> peer is
> a dead peer. If for an application a dead peer might be resurrected
> after X amount of time, there's no way to know that. It needs to be
> handled by the application.
>
> There are various tools you can use:
>
> 1) ZMTP heartbeats - see ZMQ_HEARTBEAT* socket options
> 2) socket monitoring events (including connects and disconnects) -
> see
> zmq_socket_monitor documentation
> 3) Enhance your protocol - call zmq_disconnect(endpoint) on your
> sockets when a particular message is received, or heartbeats are
> missed, or a disconnect event happens. This way when you later call
> zmq_connect(endpoint) and it happens to match a previous, dead
> peer, it
> will work as expected
>
> On Sep 2, 2017, at 1:21 PM, Luca Boccassi <luca.boccassi at gmail.
> com>
> wrote:
>
> On Sat, 2017-09-02 at 12:02 -0400, Bill Torpey wrote:
>
> Thanks again, Luca!
>
> For now, I’m going to go with disabling reconnect on the
> “data”
> sockets — that seems to be the best solution for my use case
> (connecting to endpoints that were returned by the peer
> binding
> to an
> unspecified (“wildcard”) port — e.g., "tcp://<interface>:*"
> in
> ZMQ).
>
> This assumes that ZMQ will completely forget about the
> endpoint
> if/when it is disconnected, if it is set not to
> reconnect.  Otherwise
> I might run afoul of ZMQ’s silently ignoring connections to
> endpoints
> that it already knows about:  https://github.com/zeromq/libzm
> q/is
> sues
> /788 <https://github.com/zeromq/libzmq/issues/788> (e.g., in
> the
> case
> where another process later happens to be assigned the same
> ephemeral
> port).
>
> I’ve done a quick scan of the libzmq code (v4.2.2) and it
> doesn’t
> appear that the endpoint is removed in the case of a
> (terminal)
> disconnect.  If you can confirm/deny this behavior, that
> would be
> helpful.  Failing that, I guess I’ll need to test this in the
> debugger — any hints on how best to do this would also be
> much
> appreciated.
>
> Regards,
>
> Bill
>
>
> Yes it doesn't look like it removes the endpoint - I guess it's
> a
> corner case that's missed. I'll open an issue.
>
> BTW all these things are very quick and easy to try with Python
> on
> Linux. Just install pyzmq, open a python3 terminal and:
>
> import zmq
> ctx = zmq.Context.instance()
> rep = ctx.socket(zmq.REP)
> rep.bind("tcp://127.0.0.1:12345")
> req = ctx.socket(zmq.REQ)
> req.connect("tcp://127.0.0.1:12345")
> req.send_string("hello")
> rep.recv()
> rep.send_string("hallo")
> req.recv()
> rep.unbind("tcp://127.0.0.1:12345")
> rep.close()
> rep = ctx.socket(zmq.REP)
> rep.bind("tcp://127.0.0.1:12345")
> req.send_string("hello")
> rep.recv()
> rep.send_string("hallo")
> req.recv()
> rep.unbind("tcp://127.0.0.1:12345")
> rep.close()
> req.close()
> rep = ctx.socket(zmq.REP)
> rep.bind("tcp://127.0.0.1:12345")
> req = ctx.socket(zmq.REQ)
> req.setsockopt(zmq.RECONNECT_IVL,
> -1)req.connect("tcp://127.0.0.1:12345")
> req.send_string("hello")
> rep.recv()
> rep.send_string("hallo")
> req.recv()
> rep.unbind("tcp://127.0.0.1:12345")
> rep.close()
> rep = ctx.socket(zmq.REP)
> rep.bind("tcp://127.0.0.1:12345")
> req.send_string("hello")
> rep.recv()
>
> This last one won't receive the message
>
> On Sep 1, 2017, at 6:19 PM, Luca Boccassi <luca.boccassi at gm
> ail.
> com>
> wrote:
>
> On Fri, 2017-09-01 at 18:03 -0400, Bill Torpey wrote:
>
> Thanks Luca!  That was very helpful.
>
> Although it leads to a couple of other questions:
>
> - Can I assume that a ZMQ disconnect of a tcp endpoint
> would
> only
> occur if the underlying TCP socket is closed by the OS?
> Or
> are
> there
> conditions in which ZMQ will proactively disconnect the
> TCP
> socket
> and try to reconnect?
>
>
> Normally that's the case - you can set up heartbeating with
> the
> appropriate options and that will kill a connection if
> there's
> no
> answer
>
> - I see that there is a sockopt (ZMQ_RECONNECT_IVL) that
> can
> be
> set
> to -1 to disable reconnection entirely.  In my case, the
> the
> “data”
> socket pair will *always* connect to an ephemeral port,
> so I
> *never*
> want to reconnect.  Would this be a reasonable option in
> my
> case,
> do
> you think?
>
>
> If that makes sense for your application, go for it - in
> these
> cases
> the only way to be sure is to test it and see how it works
>
> - Would there be any interest in a patch that would
> disable
> reconnects (controlled by sockopt) for ephemeral ports
> only?  I’m
> guessing that reconnecting mostly makes sense with well-
> known
> ports,
> so something like this may be of general interest?
>
>
> If by ephemeral port you mean anything over 1024, then
> actually
> in
> most
> applications I've seen it's always useful to reconnect, and
> the
> existing option should be enough for those cases where it's
> not
> desired
> - we don't want to duplicate functionality
>
> Thanks again!
>
> Bill
>
> On Sep 1, 2017, at 5:30 PM, Luca Boccassi <luca.boccass
> i at gm
> ail.
> com>
> wrote:
>
> On Fri, 2017-09-01 at 16:59 -0400, Bill Torpey wrote:
>
> I'm curious about how ZMQ handles re-connection.  I
> understand
> that
> re-connection is supposed to happen "automagically"
> under
> the
> covers,
> but that poses an interesting question.
>
> To make a long story short, the application I'm
> working
> on
> uses
> pub/sub sockets over TCP. and works like follows:
>
> At startup:
> 1.  connects to a proxy/broker at a well-known
> address,
> using
> a
> pub/sub socket pair ("discovery");
> 2.  subscribes to a well-known topic using the
> "discovery"
> sub
> socket;
> 3.  binds a different pub/sub socket pair ("data")
> and
> retrieves
> the
> actual endpoints assigned;
> 4.  publishes the "data" endpoints from step 3 on the
> "discovery"
> pub
> socket;
>
> When the application receives a message on the
> "discovery"
> sub
> socket, it connects the "data" socket pair to the
> endpoints
> specified
> in the "discovery" message.
>
> So far, this seems to be working relatively well, and
> allows
> the
> high-volume, low-latency "data" messages to be
> sent/received
> directly
> between peers, avoiding the extra hop caused by a
> proxy/broker
> connection.  The discovery messages use the
> proxy/broker,
> but
> since
> these are (very) low-volume the extra hop doesn't
> matter.  The
> use of
> the proxy also eliminates the "slow joiner" problem
> that
> can
> happen
> with other configurations.
>
> My question is what happens when one of the "data"
> peer
> sockets
> disconnects.  Since ZMQ (apparently) keeps trying to
> reconnect,
> what
> would prevent another process from binding to the
> same
> ephemeral
> port?
>
> - Can I assume that if the new application at that
> port
> is
> not a
> ZMQ
> application, that the reconnect will (silently) fail,
> and
> continue to
> be retried?
>
>
> The ZMTP handshake would fail, so yes.
>
> - What if the new application at that port *IS* a ZMQ
> application?  Would the reconnect succeed?  And if
> so,
> what
> would
> happen if it's a *DIFFERENT* ZMQ application, and the
> messages
> that
> it's sending/receiving don't match what the original
> application
> expects?
>
>
> Depends on how you handle it in your application. If
> you
> have
> security
> concerns, then use CURVE with authentication so that
> only
> authorised
> peers can connect.
>
> It's reasonable for the application to publish a
> disconnect
> message
> when it terminates normally, and the connected peers
> can
> disconnect
> that endpoint.  But, applications don't always
> terminate
> normally
> ;-)
>
>
> That's a common pattern. But the application needs to
> handle
> unexpected
> data somewhat gracefully. What that means is entirely
> up to
> the
> application - as far as the library is concerned, if
> the
> handshake
> succeeds then it's all good (hence the use case for
> CURVE).
>
> Any guidance, hints or tips would be much appreciated
> --
> thanks
> in
> advance!
>
>
> --
> Kind regards,
> Luca
> Boccassi_______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.ze
> <zeromq-dev at lists.ze>
> romq
> .org
>
> <mailto:zeromq-dev at lists.zeromq.org <zeromq-dev at lists.zeromq.org> <mailto:
> zeromq-de
> v at li
> sts.
>
>
> zeromq.org>>
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zero
> <zeromq-dev at lists.zero>
> mq.o
> rg>
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>
>
> --
> Kind regards,
> Luca
> Boccassi_______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq
> <zeromq-dev at lists.zeromq>
> .org
>
>
>
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
> --
> Kind regards,
> Luca Boccassi_______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org
> <zeromq-dev at lists.zeromq.org>>
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> <https://lists.zeromq.org/mailman/listinfo/zeromq-
> dev>_______________________________________________
>
>
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org
> <zeromq-dev at lists.zeromq.org>>
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
> --
> Kind regards,
> Luca Boccassi_______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20171001/eb34213b/attachment.htm>


More information about the zeromq-dev mailing list