[zeromq-dev] ZMQ reconnect/ephemeral ports

Bill Torpey wallstprog at gmail.com
Tue Sep 26 23:23:34 CEST 2017


Hi Luca:

Sorry for not geting back sooner, but thanks again for listening, and the PR looks good to me!

Best Regards,

Bill Torpey

> On Sep 19, 2017, at 9:13 AM, Luca Boccassi <luca.boccassi at gmail.com> wrote:
> 
> On Sun, 2017-09-17 at 12:29 -0400, Bill Torpey wrote:
>> Luca:
>> 
>> I hear what you’re saying but … I think I’m talking about a different
>> situation.
>> 
>> If I understand your explanation correctly, you’re saying that
>> setting ZMQ_RECONNECT_IVL to -1 should prevent a disconnected
>> endpoint from *ever* reconnecting, under any set of circumstances.  
>> 
>> I would read the doc (4.2.2) more like the following (with addition
>> in *bold*):
>> 
>>> The ZMQ_RECONNECT_IVL option shall set the initial reconnection
>>> interval for the specified socket. The reconnection interval is the
>>> period ØMQ shall wait between attempts to *automatically* reconnect
>>> disconnected peers when using connection-oriented transports. The
>>> value -1 means no reconnection.
>> 
>> 
>> What I’m questioning is the interaction between ZMQ_RECONNECT_IVL ==
>> -1  and the behavior enforced by https://github.com/zeromq/libzmq/iss
>> ues/788. (Also see here: https://www.mail-archive.com/zeromq-
>> dev at lists.zeromq.org/msg21484.html).  That commit is intended to
>> prevent *duplicate* connections from the same endpoint, for certain
>> socket types (e.g., pub/sub), where multiple connections (and their
>> associated duplicate messages) don’t make sense.
>> 
>> One scenario I’m concerned about is the one where:
>> 
>> 1.	Endpoint connects to us
>> 2.	Endpoint is disconnected for some reason
>> 3.	Setting ZMQ_RECONNECT_IVL=-1 disables *automatic*
>> reconnect, so as far as we’re concerned the endpoint is dead
>> 4.	Subsequently the endpoint connects to us again (e.g.,
>> following a restart)
>> 5.	Because we still have a record of the endpoint, we will
>> refuse the connection — even though the endpoint is dead from our
>> point of view.  In this scenario that endpoint can NEVER reconnect.
>> 
>> So I get that setting ZMQ_RECONNECT_IVL should prevent us from
>> reconnecting (automatically) to the disconnected endpoint, but I
>> don’t see the benefit of preventing that endpoint from actively
>> reconnecting at a later time.  In this case, we’ve essentially
>> blacklisted that endpoint (forever), and I’m having trouble coming up
>> with a scenario where that would be intended behavior.
>> 
>> Does this make sense?  Am I missing something here?
>> 
>> Also, to your point about adding a protocol layer on top of 0MQ — I
>> would MUCH prefer to let 0MQ handle as much of the underlying
>> connect/disconnect logic as possible.  I’m concerned about the
>> potential for the protocol’s view of the connection state getting out
>> of sync with 0MQ’s view (not to mention a bunch of additional work on
>> the protocol layer, but more about synchronization).
>> 
>> Thanks for listening ...
>> 
>> Bill
> 
> I see. I guess there's a terminology confusion issue here - when I
> wrote about connections and disconnections, I meant the automated ones
> that happen in the background in the I/O thread. But I guess it makes
> sense that a manual call to zmq_connect should work as expected.
> 
> A workaround for this behaviour would be for the application to
> manually call zmq_disconnect before doing a connect to the same
> endpoint.
> 
> But it turns out fixing it to automatically do it is not too hard
> (unless I've made some silly mistake):
> 
> https://github.com/zeromq/libzmq/pull/2756 <https://github.com/zeromq/libzmq/pull/2756>
> 
>>> On Sep 17, 2017, at 6:39 AM, Luca Boccassi <luca.boccassi at gmail.com
>>>> wrote:
>>> 
>>> On Sat, 2017-09-16 at 14:34 -0400, Bill Torpey wrote:
>>>> Hi Luca:
>>>> 
>>>> Just a gentle reminder to add an issue so this can be tracked (or
>>>> let
>>>> me know if you’d prefer that I do that).
>>>> 
>>>> Thanks!
>>>> 
>>>> Bill
>>> 
>>> Thinking about this a bit more, I think it's expected behaviour
>>> after
>>> all. From the doc:
>>> 
>>> "The 'ZMQ_RECONNECT_IVL' option shall set the initial reconnection
>>> interval for the specified 'socket'.  The reconnection interval is
>>> the
>>> period 0MQ shall wait between attempts to reconnect disconnected
>>> peers
>>> when using connection-oriented transports. The value -1 means no
>>> reconnection."
>>> 
>>> So it is working as intended - if a peer goes away, it will never
>>> be
>>> reconnected if that option is set.
>>> 
>>> And it makes sense - in the context of a TCP connection, a dead
>>> peer is
>>> a dead peer. If for an application a dead peer might be resurrected
>>> after X amount of time, there's no way to know that. It needs to be
>>> handled by the application.
>>> 
>>> There are various tools you can use:
>>> 
>>> 1) ZMTP heartbeats - see ZMQ_HEARTBEAT* socket options
>>> 2) socket monitoring events (including connects and disconnects) -
>>> see
>>> zmq_socket_monitor documentation
>>> 3) Enhance your protocol - call zmq_disconnect(endpoint) on your
>>> sockets when a particular message is received, or heartbeats are
>>> missed, or a disconnect event happens. This way when you later call
>>> zmq_connect(endpoint) and it happens to match a previous, dead
>>> peer, it
>>> will work as expected
>>> 
>>>>> On Sep 2, 2017, at 1:21 PM, Luca Boccassi <luca.boccassi at gmail.
>>>>> com>
>>>>> wrote:
>>>>> 
>>>>> On Sat, 2017-09-02 at 12:02 -0400, Bill Torpey wrote:
>>>>>> Thanks again, Luca!
>>>>>> 
>>>>>> For now, I’m going to go with disabling reconnect on the
>>>>>> “data”
>>>>>> sockets — that seems to be the best solution for my use case
>>>>>> (connecting to endpoints that were returned by the peer
>>>>>> binding
>>>>>> to an
>>>>>> unspecified (“wildcard”) port — e.g., "tcp://<interface>:*"
>>>>>> in
>>>>>> ZMQ).
>>>>>> 
>>>>>> This assumes that ZMQ will completely forget about the
>>>>>> endpoint
>>>>>> if/when it is disconnected, if it is set not to
>>>>>> reconnect.  Otherwise
>>>>>> I might run afoul of ZMQ’s silently ignoring connections to
>>>>>> endpoints
>>>>>> that it already knows about:  https://github.com/zeromq/libzm
>>>>>> q/is
>>>>>> sues
>>>>>> /788 <https://github.com/zeromq/libzmq/issues/788> (e.g., in
>>>>>> the
>>>>>> case
>>>>>> where another process later happens to be assigned the same
>>>>>> ephemeral
>>>>>> port).
>>>>>> 
>>>>>> I’ve done a quick scan of the libzmq code (v4.2.2) and it
>>>>>> doesn’t
>>>>>> appear that the endpoint is removed in the case of a
>>>>>> (terminal)
>>>>>> disconnect.  If you can confirm/deny this behavior, that
>>>>>> would be
>>>>>> helpful.  Failing that, I guess I’ll need to test this in the
>>>>>> debugger — any hints on how best to do this would also be
>>>>>> much
>>>>>> appreciated.
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>> Bill
>>>>> 
>>>>> Yes it doesn't look like it removes the endpoint - I guess it's
>>>>> a
>>>>> corner case that's missed. I'll open an issue.
>>>>> 
>>>>> BTW all these things are very quick and easy to try with Python
>>>>> on
>>>>> Linux. Just install pyzmq, open a python3 terminal and:
>>>>> 
>>>>> import zmq
>>>>> ctx = zmq.Context.instance()
>>>>> rep = ctx.socket(zmq.REP)
>>>>> rep.bind("tcp://127.0.0.1:12345")
>>>>> req = ctx.socket(zmq.REQ)
>>>>> req.connect("tcp://127.0.0.1:12345")
>>>>> req.send_string("hello")
>>>>> rep.recv()
>>>>> rep.send_string("hallo")
>>>>> req.recv()
>>>>> rep.unbind("tcp://127.0.0.1:12345")
>>>>> rep.close()
>>>>> rep = ctx.socket(zmq.REP)
>>>>> rep.bind("tcp://127.0.0.1:12345")
>>>>> req.send_string("hello")
>>>>> rep.recv()
>>>>> rep.send_string("hallo")
>>>>> req.recv()
>>>>> rep.unbind("tcp://127.0.0.1:12345")
>>>>> rep.close()
>>>>> req.close()
>>>>> rep = ctx.socket(zmq.REP)
>>>>> rep.bind("tcp://127.0.0.1:12345")
>>>>> req = ctx.socket(zmq.REQ)
>>>>> req.setsockopt(zmq.RECONNECT_IVL,
>>>>> -1)req.connect("tcp://127.0.0.1:12345")
>>>>> req.send_string("hello")
>>>>> rep.recv()
>>>>> rep.send_string("hallo")
>>>>> req.recv()
>>>>> rep.unbind("tcp://127.0.0.1:12345")
>>>>> rep.close()
>>>>> rep = ctx.socket(zmq.REP)
>>>>> rep.bind("tcp://127.0.0.1:12345")
>>>>> req.send_string("hello")
>>>>> rep.recv()
>>>>> 
>>>>> This last one won't receive the message
>>>>> 
>>>>>>> On Sep 1, 2017, at 6:19 PM, Luca Boccassi <luca.boccassi at gm
>>>>>>> ail.
>>>>>>> com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> On Fri, 2017-09-01 at 18:03 -0400, Bill Torpey wrote:
>>>>>>>> Thanks Luca!  That was very helpful.
>>>>>>>> 
>>>>>>>> Although it leads to a couple of other questions:
>>>>>>>> 
>>>>>>>> - Can I assume that a ZMQ disconnect of a tcp endpoint
>>>>>>>> would
>>>>>>>> only
>>>>>>>> occur if the underlying TCP socket is closed by the OS?
>>>>>>>> Or
>>>>>>>> are
>>>>>>>> there
>>>>>>>> conditions in which ZMQ will proactively disconnect the
>>>>>>>> TCP
>>>>>>>> socket
>>>>>>>> and try to reconnect?
>>>>>>> 
>>>>>>> Normally that's the case - you can set up heartbeating with
>>>>>>> the
>>>>>>> appropriate options and that will kill a connection if
>>>>>>> there's
>>>>>>> no
>>>>>>> answer
>>>>>>> 
>>>>>>>> - I see that there is a sockopt (ZMQ_RECONNECT_IVL) that
>>>>>>>> can
>>>>>>>> be
>>>>>>>> set
>>>>>>>> to -1 to disable reconnection entirely.  In my case, the
>>>>>>>> the
>>>>>>>> “data”
>>>>>>>> socket pair will *always* connect to an ephemeral port,
>>>>>>>> so I
>>>>>>>> *never*
>>>>>>>> want to reconnect.  Would this be a reasonable option in
>>>>>>>> my
>>>>>>>> case,
>>>>>>>> do
>>>>>>>> you think?
>>>>>>> 
>>>>>>> If that makes sense for your application, go for it - in
>>>>>>> these
>>>>>>> cases
>>>>>>> the only way to be sure is to test it and see how it works
>>>>>>> 
>>>>>>>> - Would there be any interest in a patch that would
>>>>>>>> disable
>>>>>>>> reconnects (controlled by sockopt) for ephemeral ports
>>>>>>>> only?  I’m
>>>>>>>> guessing that reconnecting mostly makes sense with well-
>>>>>>>> known
>>>>>>>> ports,
>>>>>>>> so something like this may be of general interest?
>>>>>>> 
>>>>>>> If by ephemeral port you mean anything over 1024, then
>>>>>>> actually
>>>>>>> in
>>>>>>> most
>>>>>>> applications I've seen it's always useful to reconnect, and
>>>>>>> the
>>>>>>> existing option should be enough for those cases where it's
>>>>>>> not
>>>>>>> desired
>>>>>>> - we don't want to duplicate functionality
>>>>>>> 
>>>>>>>> Thanks again!
>>>>>>>> 
>>>>>>>> Bill 
>>>>>>>> 
>>>>>>>>> On Sep 1, 2017, at 5:30 PM, Luca Boccassi <luca.boccass
>>>>>>>>> i at gm
>>>>>>>>> ail.
>>>>>>>>> com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> On Fri, 2017-09-01 at 16:59 -0400, Bill Torpey wrote:
>>>>>>>>>> I'm curious about how ZMQ handles re-connection.  I
>>>>>>>>>> understand
>>>>>>>>>> that
>>>>>>>>>> re-connection is supposed to happen "automagically"
>>>>>>>>>> under
>>>>>>>>>> the
>>>>>>>>>> covers,
>>>>>>>>>> but that poses an interesting question.
>>>>>>>>>> 
>>>>>>>>>> To make a long story short, the application I'm
>>>>>>>>>> working
>>>>>>>>>> on
>>>>>>>>>> uses
>>>>>>>>>> pub/sub sockets over TCP. and works like follows:
>>>>>>>>>> 
>>>>>>>>>> At startup:
>>>>>>>>>> 1.  connects to a proxy/broker at a well-known
>>>>>>>>>> address,
>>>>>>>>>> using
>>>>>>>>>> a
>>>>>>>>>> pub/sub socket pair ("discovery");
>>>>>>>>>> 2.  subscribes to a well-known topic using the
>>>>>>>>>> "discovery"
>>>>>>>>>> sub
>>>>>>>>>> socket;
>>>>>>>>>> 3.  binds a different pub/sub socket pair ("data")
>>>>>>>>>> and
>>>>>>>>>> retrieves
>>>>>>>>>> the
>>>>>>>>>> actual endpoints assigned;
>>>>>>>>>> 4.  publishes the "data" endpoints from step 3 on the
>>>>>>>>>> "discovery"
>>>>>>>>>> pub
>>>>>>>>>> socket; 
>>>>>>>>>> 
>>>>>>>>>> When the application receives a message on the
>>>>>>>>>> "discovery"
>>>>>>>>>> sub
>>>>>>>>>> socket, it connects the "data" socket pair to the
>>>>>>>>>> endpoints
>>>>>>>>>> specified
>>>>>>>>>> in the "discovery" message.
>>>>>>>>>> 
>>>>>>>>>> So far, this seems to be working relatively well, and
>>>>>>>>>> allows
>>>>>>>>>> the
>>>>>>>>>> high-volume, low-latency "data" messages to be
>>>>>>>>>> sent/received
>>>>>>>>>> directly
>>>>>>>>>> between peers, avoiding the extra hop caused by a
>>>>>>>>>> proxy/broker
>>>>>>>>>> connection.  The discovery messages use the
>>>>>>>>>> proxy/broker,
>>>>>>>>>> but
>>>>>>>>>> since
>>>>>>>>>> these are (very) low-volume the extra hop doesn't
>>>>>>>>>> matter.  The
>>>>>>>>>> use of
>>>>>>>>>> the proxy also eliminates the "slow joiner" problem
>>>>>>>>>> that
>>>>>>>>>> can
>>>>>>>>>> happen
>>>>>>>>>> with other configurations.
>>>>>>>>>> 
>>>>>>>>>> My question is what happens when one of the "data"
>>>>>>>>>> peer
>>>>>>>>>> sockets
>>>>>>>>>> disconnects.  Since ZMQ (apparently) keeps trying to
>>>>>>>>>> reconnect,
>>>>>>>>>> what
>>>>>>>>>> would prevent another process from binding to the
>>>>>>>>>> same
>>>>>>>>>> ephemeral
>>>>>>>>>> port?  
>>>>>>>>>> 
>>>>>>>>>> - Can I assume that if the new application at that
>>>>>>>>>> port
>>>>>>>>>> is
>>>>>>>>>> not a
>>>>>>>>>> ZMQ
>>>>>>>>>> application, that the reconnect will (silently) fail,
>>>>>>>>>> and
>>>>>>>>>> continue to
>>>>>>>>>> be retried?
>>>>>>>>> 
>>>>>>>>> The ZMTP handshake would fail, so yes.
>>>>>>>>> 
>>>>>>>>>> - What if the new application at that port *IS* a ZMQ
>>>>>>>>>> application?  Would the reconnect succeed?  And if
>>>>>>>>>> so,
>>>>>>>>>> what
>>>>>>>>>> would
>>>>>>>>>> happen if it's a *DIFFERENT* ZMQ application, and the
>>>>>>>>>> messages
>>>>>>>>>> that
>>>>>>>>>> it's sending/receiving don't match what the original
>>>>>>>>>> application
>>>>>>>>>> expects?
>>>>>>>>> 
>>>>>>>>> Depends on how you handle it in your application. If
>>>>>>>>> you
>>>>>>>>> have
>>>>>>>>> security
>>>>>>>>> concerns, then use CURVE with authentication so that
>>>>>>>>> only
>>>>>>>>> authorised
>>>>>>>>> peers can connect.
>>>>>>>>> 
>>>>>>>>>> It's reasonable for the application to publish a
>>>>>>>>>> disconnect
>>>>>>>>>> message
>>>>>>>>>> when it terminates normally, and the connected peers
>>>>>>>>>> can
>>>>>>>>>> disconnect
>>>>>>>>>> that endpoint.  But, applications don't always
>>>>>>>>>> terminate
>>>>>>>>>> normally
>>>>>>>>>> ;-)
>>>>>>>>> 
>>>>>>>>> That's a common pattern. But the application needs to
>>>>>>>>> handle
>>>>>>>>> unexpected
>>>>>>>>> data somewhat gracefully. What that means is entirely
>>>>>>>>> up to
>>>>>>>>> the
>>>>>>>>> application - as far as the library is concerned, if
>>>>>>>>> the
>>>>>>>>> handshake
>>>>>>>>> succeeds then it's all good (hence the use case for
>>>>>>>>> CURVE).
>>>>>>>>> 
>>>>>>>>>> Any guidance, hints or tips would be much appreciated
>>>>>>>>>> --
>>>>>>>>>> thanks
>>>>>>>>>> in
>>>>>>>>>> advance!
>>>>>>>>> 
>>>>>>>>> -- 
>>>>>>>>> Kind regards,
>>>>>>>>> Luca
>>>>>>>>> Boccassi_______________________________________________
>>>>>>>>> zeromq-dev mailing list
>>>>>>>>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.ze
>>>>>>>>> romq
>>>>>>>>> .org
>>>>>>>>>> <mailto:zeromq-dev at lists.zeromq.org <mailto:zeromq-de
>>>>>>>>>> v at li
>>>>>>>>>> sts.
>>>>>>>>> 
>>>>>>>>> zeromq.org>>
>>>>>>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>>>>>>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>>
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> zeromq-dev mailing list
>>>>>>>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zero
>>>>>>>> mq.o
>>>>>>>> rg>
>>>>>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>>>>>>> 
>>>>>>> -- 
>>>>>>> Kind regards,
>>>>>>> Luca
>>>>>>> Boccassi_______________________________________________
>>>>>>> zeromq-dev mailing list
>>>>>>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq
>>>>>>> .org
>>>>>>>> 
>>>>>>> 
>>>>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>>>>>> 
>>>>>> _______________________________________________
>>>>>> zeromq-dev mailing list
>>>>>> zeromq-dev at lists.zeromq.org
>>>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>> 
>>>>> -- 
>>>>> Kind regards,
>>>>> Luca Boccassi_______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org
>>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>> 
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org> <mailto:zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>>
>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq- <https://lists.zeromq.org/mailman/listinfo/zeromq->
>>>> dev>_______________________________________________
>>> 
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org> <mailto:zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>>
>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>>
>> 
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
> 
> -- 
> Kind regards,
> Luca Boccassi_______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20170926/9d4da7aa/attachment.html>


More information about the zeromq-dev mailing list