[zeromq-dev] ZMQ reconnect/ephemeral ports

Bill Torpey wallstprog at gmail.com
Sun Sep 17 18:29:09 CEST 2017


Luca:

I hear what you’re saying but … I think I’m talking about a different situation.

If I understand your explanation correctly, you’re saying that setting ZMQ_RECONNECT_IVL to -1 should prevent a disconnected endpoint from *ever* reconnecting, under any set of circumstances.  

I would read the doc (4.2.2) more like the following (with addition in *bold*):

> The ZMQ_RECONNECT_IVL option shall set the initial reconnection interval for the specified socket. The reconnection interval is the period ØMQ shall wait between attempts to *automatically* reconnect disconnected peers when using connection-oriented transports. The value -1 means no reconnection.


What I’m questioning is the interaction between ZMQ_RECONNECT_IVL == -1  and the behavior enforced by https://github.com/zeromq/libzmq/issues/788. (Also see here: https://www.mail-archive.com/zeromq-dev@lists.zeromq.org/msg21484.html).  That commit is intended to prevent *duplicate* connections from the same endpoint, for certain socket types (e.g., pub/sub), where multiple connections (and their associated duplicate messages) don’t make sense.

One scenario I’m concerned about is the one where:

1.	Endpoint connects to us
2.	Endpoint is disconnected for some reason
3.	Setting ZMQ_RECONNECT_IVL=-1 disables *automatic* reconnect, so as far as we’re concerned the endpoint is dead
4.	Subsequently the endpoint connects to us again (e.g., following a restart)
5.	Because we still have a record of the endpoint, we will refuse the connection — even though the endpoint is dead from our point of view.  In this scenario that endpoint can NEVER reconnect.

So I get that setting ZMQ_RECONNECT_IVL should prevent us from reconnecting (automatically) to the disconnected endpoint, but I don’t see the benefit of preventing that endpoint from actively reconnecting at a later time.  In this case, we’ve essentially blacklisted that endpoint (forever), and I’m having trouble coming up with a scenario where that would be intended behavior.

Does this make sense?  Am I missing something here?

Also, to your point about adding a protocol layer on top of 0MQ — I would MUCH prefer to let 0MQ handle as much of the underlying connect/disconnect logic as possible.  I’m concerned about the potential for the protocol’s view of the connection state getting out of sync with 0MQ’s view (not to mention a bunch of additional work on the protocol layer, but more about synchronization).

Thanks for listening ...

Bill

> On Sep 17, 2017, at 6:39 AM, Luca Boccassi <luca.boccassi at gmail.com> wrote:
> 
> On Sat, 2017-09-16 at 14:34 -0400, Bill Torpey wrote:
>> Hi Luca:
>> 
>> Just a gentle reminder to add an issue so this can be tracked (or let
>> me know if you’d prefer that I do that).
>> 
>> Thanks!
>> 
>> Bill
> 
> Thinking about this a bit more, I think it's expected behaviour after
> all. From the doc:
> 
> "The 'ZMQ_RECONNECT_IVL' option shall set the initial reconnection
> interval for the specified 'socket'.  The reconnection interval is the
> period 0MQ shall wait between attempts to reconnect disconnected peers
> when using connection-oriented transports. The value -1 means no
> reconnection."
> 
> So it is working as intended - if a peer goes away, it will never be
> reconnected if that option is set.
> 
> And it makes sense - in the context of a TCP connection, a dead peer is
> a dead peer. If for an application a dead peer might be resurrected
> after X amount of time, there's no way to know that. It needs to be
> handled by the application.
> 
> There are various tools you can use:
> 
> 1) ZMTP heartbeats - see ZMQ_HEARTBEAT* socket options
> 2) socket monitoring events (including connects and disconnects) - see
> zmq_socket_monitor documentation
> 3) Enhance your protocol - call zmq_disconnect(endpoint) on your
> sockets when a particular message is received, or heartbeats are
> missed, or a disconnect event happens. This way when you later call
> zmq_connect(endpoint) and it happens to match a previous, dead peer, it
> will work as expected
> 
>>> On Sep 2, 2017, at 1:21 PM, Luca Boccassi <luca.boccassi at gmail.com>
>>> wrote:
>>> 
>>> On Sat, 2017-09-02 at 12:02 -0400, Bill Torpey wrote:
>>>> Thanks again, Luca!
>>>> 
>>>> For now, I’m going to go with disabling reconnect on the “data”
>>>> sockets — that seems to be the best solution for my use case
>>>> (connecting to endpoints that were returned by the peer binding
>>>> to an
>>>> unspecified (“wildcard”) port — e.g., "tcp://<interface>:*" in
>>>> ZMQ).
>>>> 
>>>> This assumes that ZMQ will completely forget about the endpoint
>>>> if/when it is disconnected, if it is set not to
>>>> reconnect.  Otherwise
>>>> I might run afoul of ZMQ’s silently ignoring connections to
>>>> endpoints
>>>> that it already knows about:  https://github.com/zeromq/libzmq/is
>>>> sues
>>>> /788 <https://github.com/zeromq/libzmq/issues/788> (e.g., in the
>>>> case
>>>> where another process later happens to be assigned the same
>>>> ephemeral
>>>> port).
>>>> 
>>>> I’ve done a quick scan of the libzmq code (v4.2.2) and it doesn’t
>>>> appear that the endpoint is removed in the case of a (terminal)
>>>> disconnect.  If you can confirm/deny this behavior, that would be
>>>> helpful.  Failing that, I guess I’ll need to test this in the
>>>> debugger — any hints on how best to do this would also be much
>>>> appreciated.
>>>> 
>>>> Regards,
>>>> 
>>>> Bill
>>> 
>>> Yes it doesn't look like it removes the endpoint - I guess it's a
>>> corner case that's missed. I'll open an issue.
>>> 
>>> BTW all these things are very quick and easy to try with Python on
>>> Linux. Just install pyzmq, open a python3 terminal and:
>>> 
>>> import zmq
>>> ctx = zmq.Context.instance()
>>> rep = ctx.socket(zmq.REP)
>>> rep.bind("tcp://127.0.0.1:12345")
>>> req = ctx.socket(zmq.REQ)
>>> req.connect("tcp://127.0.0.1:12345")
>>> req.send_string("hello")
>>> rep.recv()
>>> rep.send_string("hallo")
>>> req.recv()
>>> rep.unbind("tcp://127.0.0.1:12345")
>>> rep.close()
>>> rep = ctx.socket(zmq.REP)
>>> rep.bind("tcp://127.0.0.1:12345")
>>> req.send_string("hello")
>>> rep.recv()
>>> rep.send_string("hallo")
>>> req.recv()
>>> rep.unbind("tcp://127.0.0.1:12345")
>>> rep.close()
>>> req.close()
>>> rep = ctx.socket(zmq.REP)
>>> rep.bind("tcp://127.0.0.1:12345")
>>> req = ctx.socket(zmq.REQ)
>>> req.setsockopt(zmq.RECONNECT_IVL,
>>> -1)req.connect("tcp://127.0.0.1:12345")
>>> req.send_string("hello")
>>> rep.recv()
>>> rep.send_string("hallo")
>>> req.recv()
>>> rep.unbind("tcp://127.0.0.1:12345")
>>> rep.close()
>>> rep = ctx.socket(zmq.REP)
>>> rep.bind("tcp://127.0.0.1:12345")
>>> req.send_string("hello")
>>> rep.recv()
>>> 
>>> This last one won't receive the message
>>> 
>>>>> On Sep 1, 2017, at 6:19 PM, Luca Boccassi <luca.boccassi at gmail.
>>>>> com>
>>>>> wrote:
>>>>> 
>>>>> On Fri, 2017-09-01 at 18:03 -0400, Bill Torpey wrote:
>>>>>> Thanks Luca!  That was very helpful.
>>>>>> 
>>>>>> Although it leads to a couple of other questions:
>>>>>> 
>>>>>> - Can I assume that a ZMQ disconnect of a tcp endpoint would
>>>>>> only
>>>>>> occur if the underlying TCP socket is closed by the OS? Or
>>>>>> are
>>>>>> there
>>>>>> conditions in which ZMQ will proactively disconnect the TCP
>>>>>> socket
>>>>>> and try to reconnect?
>>>>> 
>>>>> Normally that's the case - you can set up heartbeating with the
>>>>> appropriate options and that will kill a connection if there's
>>>>> no
>>>>> answer
>>>>> 
>>>>>> - I see that there is a sockopt (ZMQ_RECONNECT_IVL) that can
>>>>>> be
>>>>>> set
>>>>>> to -1 to disable reconnection entirely.  In my case, the the
>>>>>> “data”
>>>>>> socket pair will *always* connect to an ephemeral port, so I
>>>>>> *never*
>>>>>> want to reconnect.  Would this be a reasonable option in my
>>>>>> case,
>>>>>> do
>>>>>> you think?
>>>>> 
>>>>> If that makes sense for your application, go for it - in these
>>>>> cases
>>>>> the only way to be sure is to test it and see how it works
>>>>> 
>>>>>> - Would there be any interest in a patch that would disable
>>>>>> reconnects (controlled by sockopt) for ephemeral ports
>>>>>> only?  I’m
>>>>>> guessing that reconnecting mostly makes sense with well-known
>>>>>> ports,
>>>>>> so something like this may be of general interest?
>>>>> 
>>>>> If by ephemeral port you mean anything over 1024, then actually
>>>>> in
>>>>> most
>>>>> applications I've seen it's always useful to reconnect, and the
>>>>> existing option should be enough for those cases where it's not
>>>>> desired
>>>>> - we don't want to duplicate functionality
>>>>> 
>>>>>> Thanks again!
>>>>>> 
>>>>>> Bill 
>>>>>> 
>>>>>>> On Sep 1, 2017, at 5:30 PM, Luca Boccassi <luca.boccassi at gm
>>>>>>> ail.
>>>>>>> com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> On Fri, 2017-09-01 at 16:59 -0400, Bill Torpey wrote:
>>>>>>>> I'm curious about how ZMQ handles re-connection.  I
>>>>>>>> understand
>>>>>>>> that
>>>>>>>> re-connection is supposed to happen "automagically" under
>>>>>>>> the
>>>>>>>> covers,
>>>>>>>> but that poses an interesting question.
>>>>>>>> 
>>>>>>>> To make a long story short, the application I'm working
>>>>>>>> on
>>>>>>>> uses
>>>>>>>> pub/sub sockets over TCP. and works like follows:
>>>>>>>> 
>>>>>>>> At startup:
>>>>>>>> 1.  connects to a proxy/broker at a well-known address,
>>>>>>>> using
>>>>>>>> a
>>>>>>>> pub/sub socket pair ("discovery");
>>>>>>>> 2.  subscribes to a well-known topic using the
>>>>>>>> "discovery"
>>>>>>>> sub
>>>>>>>> socket;
>>>>>>>> 3.  binds a different pub/sub socket pair ("data") and
>>>>>>>> retrieves
>>>>>>>> the
>>>>>>>> actual endpoints assigned;
>>>>>>>> 4.  publishes the "data" endpoints from step 3 on the
>>>>>>>> "discovery"
>>>>>>>> pub
>>>>>>>> socket; 
>>>>>>>> 
>>>>>>>> When the application receives a message on the
>>>>>>>> "discovery"
>>>>>>>> sub
>>>>>>>> socket, it connects the "data" socket pair to the
>>>>>>>> endpoints
>>>>>>>> specified
>>>>>>>> in the "discovery" message.
>>>>>>>> 
>>>>>>>> So far, this seems to be working relatively well, and
>>>>>>>> allows
>>>>>>>> the
>>>>>>>> high-volume, low-latency "data" messages to be
>>>>>>>> sent/received
>>>>>>>> directly
>>>>>>>> between peers, avoiding the extra hop caused by a
>>>>>>>> proxy/broker
>>>>>>>> connection.  The discovery messages use the proxy/broker,
>>>>>>>> but
>>>>>>>> since
>>>>>>>> these are (very) low-volume the extra hop doesn't
>>>>>>>> matter.  The
>>>>>>>> use of
>>>>>>>> the proxy also eliminates the "slow joiner" problem that
>>>>>>>> can
>>>>>>>> happen
>>>>>>>> with other configurations.
>>>>>>>> 
>>>>>>>> My question is what happens when one of the "data" peer
>>>>>>>> sockets
>>>>>>>> disconnects.  Since ZMQ (apparently) keeps trying to
>>>>>>>> reconnect,
>>>>>>>> what
>>>>>>>> would prevent another process from binding to the same
>>>>>>>> ephemeral
>>>>>>>> port?  
>>>>>>>> 
>>>>>>>> - Can I assume that if the new application at that port
>>>>>>>> is
>>>>>>>> not a
>>>>>>>> ZMQ
>>>>>>>> application, that the reconnect will (silently) fail, and
>>>>>>>> continue to
>>>>>>>> be retried?
>>>>>>> 
>>>>>>> The ZMTP handshake would fail, so yes.
>>>>>>> 
>>>>>>>> - What if the new application at that port *IS* a ZMQ
>>>>>>>> application?  Would the reconnect succeed?  And if so,
>>>>>>>> what
>>>>>>>> would
>>>>>>>> happen if it's a *DIFFERENT* ZMQ application, and the
>>>>>>>> messages
>>>>>>>> that
>>>>>>>> it's sending/receiving don't match what the original
>>>>>>>> application
>>>>>>>> expects?
>>>>>>> 
>>>>>>> Depends on how you handle it in your application. If you
>>>>>>> have
>>>>>>> security
>>>>>>> concerns, then use CURVE with authentication so that only
>>>>>>> authorised
>>>>>>> peers can connect.
>>>>>>> 
>>>>>>>> It's reasonable for the application to publish a
>>>>>>>> disconnect
>>>>>>>> message
>>>>>>>> when it terminates normally, and the connected peers can
>>>>>>>> disconnect
>>>>>>>> that endpoint.  But, applications don't always terminate
>>>>>>>> normally
>>>>>>>> ;-)
>>>>>>> 
>>>>>>> That's a common pattern. But the application needs to
>>>>>>> handle
>>>>>>> unexpected
>>>>>>> data somewhat gracefully. What that means is entirely up to
>>>>>>> the
>>>>>>> application - as far as the library is concerned, if the
>>>>>>> handshake
>>>>>>> succeeds then it's all good (hence the use case for CURVE).
>>>>>>> 
>>>>>>>> Any guidance, hints or tips would be much appreciated --
>>>>>>>> thanks
>>>>>>>> in
>>>>>>>> advance!
>>>>>>> 
>>>>>>> -- 
>>>>>>> Kind regards,
>>>>>>> Luca
>>>>>>> Boccassi_______________________________________________
>>>>>>> zeromq-dev mailing list
>>>>>>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq
>>>>>>> .org
>>>>>>>> <mailto:zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at li
>>>>>>>> sts.
>>>>>>> 
>>>>>>> zeromq.org>>
>>>>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>>>>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>>
>>>>>> 
>>>>>> _______________________________________________
>>>>>> zeromq-dev mailing list
>>>>>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.o
>>>>>> rg>
>>>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>>>>> 
>>>>> -- 
>>>>> Kind regards,
>>>>> Luca Boccassi_______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org
>>>>>> 
>>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>>>> 
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> 
>>> -- 
>>> Kind regards,
>>> Luca Boccassi_______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> 
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>_______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20170917/c3d0ba0d/attachment.htm>


More information about the zeromq-dev mailing list