[zeromq-dev] ZMQ reconnect/ephemeral ports
Bill Torpey
wallstprog at gmail.com
Sun Sep 17 18:29:09 CEST 2017
Luca:
I hear what you’re saying but … I think I’m talking about a different situation.
If I understand your explanation correctly, you’re saying that setting ZMQ_RECONNECT_IVL to -1 should prevent a disconnected endpoint from *ever* reconnecting, under any set of circumstances.
I would read the doc (4.2.2) more like the following (with addition in *bold*):
> The ZMQ_RECONNECT_IVL option shall set the initial reconnection interval for the specified socket. The reconnection interval is the period ØMQ shall wait between attempts to *automatically* reconnect disconnected peers when using connection-oriented transports. The value -1 means no reconnection.
What I’m questioning is the interaction between ZMQ_RECONNECT_IVL == -1 and the behavior enforced by https://github.com/zeromq/libzmq/issues/788. (Also see here: https://www.mail-archive.com/zeromq-dev@lists.zeromq.org/msg21484.html). That commit is intended to prevent *duplicate* connections from the same endpoint, for certain socket types (e.g., pub/sub), where multiple connections (and their associated duplicate messages) don’t make sense.
One scenario I’m concerned about is the one where:
1. Endpoint connects to us
2. Endpoint is disconnected for some reason
3. Setting ZMQ_RECONNECT_IVL=-1 disables *automatic* reconnect, so as far as we’re concerned the endpoint is dead
4. Subsequently the endpoint connects to us again (e.g., following a restart)
5. Because we still have a record of the endpoint, we will refuse the connection — even though the endpoint is dead from our point of view. In this scenario that endpoint can NEVER reconnect.
So I get that setting ZMQ_RECONNECT_IVL should prevent us from reconnecting (automatically) to the disconnected endpoint, but I don’t see the benefit of preventing that endpoint from actively reconnecting at a later time. In this case, we’ve essentially blacklisted that endpoint (forever), and I’m having trouble coming up with a scenario where that would be intended behavior.
Does this make sense? Am I missing something here?
Also, to your point about adding a protocol layer on top of 0MQ — I would MUCH prefer to let 0MQ handle as much of the underlying connect/disconnect logic as possible. I’m concerned about the potential for the protocol’s view of the connection state getting out of sync with 0MQ’s view (not to mention a bunch of additional work on the protocol layer, but more about synchronization).
Thanks for listening ...
Bill
> On Sep 17, 2017, at 6:39 AM, Luca Boccassi <luca.boccassi at gmail.com> wrote:
>
> On Sat, 2017-09-16 at 14:34 -0400, Bill Torpey wrote:
>> Hi Luca:
>>
>> Just a gentle reminder to add an issue so this can be tracked (or let
>> me know if you’d prefer that I do that).
>>
>> Thanks!
>>
>> Bill
>
> Thinking about this a bit more, I think it's expected behaviour after
> all. From the doc:
>
> "The 'ZMQ_RECONNECT_IVL' option shall set the initial reconnection
> interval for the specified 'socket'. The reconnection interval is the
> period 0MQ shall wait between attempts to reconnect disconnected peers
> when using connection-oriented transports. The value -1 means no
> reconnection."
>
> So it is working as intended - if a peer goes away, it will never be
> reconnected if that option is set.
>
> And it makes sense - in the context of a TCP connection, a dead peer is
> a dead peer. If for an application a dead peer might be resurrected
> after X amount of time, there's no way to know that. It needs to be
> handled by the application.
>
> There are various tools you can use:
>
> 1) ZMTP heartbeats - see ZMQ_HEARTBEAT* socket options
> 2) socket monitoring events (including connects and disconnects) - see
> zmq_socket_monitor documentation
> 3) Enhance your protocol - call zmq_disconnect(endpoint) on your
> sockets when a particular message is received, or heartbeats are
> missed, or a disconnect event happens. This way when you later call
> zmq_connect(endpoint) and it happens to match a previous, dead peer, it
> will work as expected
>
>>> On Sep 2, 2017, at 1:21 PM, Luca Boccassi <luca.boccassi at gmail.com>
>>> wrote:
>>>
>>> On Sat, 2017-09-02 at 12:02 -0400, Bill Torpey wrote:
>>>> Thanks again, Luca!
>>>>
>>>> For now, I’m going to go with disabling reconnect on the “data”
>>>> sockets — that seems to be the best solution for my use case
>>>> (connecting to endpoints that were returned by the peer binding
>>>> to an
>>>> unspecified (“wildcard”) port — e.g., "tcp://<interface>:*" in
>>>> ZMQ).
>>>>
>>>> This assumes that ZMQ will completely forget about the endpoint
>>>> if/when it is disconnected, if it is set not to
>>>> reconnect. Otherwise
>>>> I might run afoul of ZMQ’s silently ignoring connections to
>>>> endpoints
>>>> that it already knows about: https://github.com/zeromq/libzmq/is
>>>> sues
>>>> /788 <https://github.com/zeromq/libzmq/issues/788> (e.g., in the
>>>> case
>>>> where another process later happens to be assigned the same
>>>> ephemeral
>>>> port).
>>>>
>>>> I’ve done a quick scan of the libzmq code (v4.2.2) and it doesn’t
>>>> appear that the endpoint is removed in the case of a (terminal)
>>>> disconnect. If you can confirm/deny this behavior, that would be
>>>> helpful. Failing that, I guess I’ll need to test this in the
>>>> debugger — any hints on how best to do this would also be much
>>>> appreciated.
>>>>
>>>> Regards,
>>>>
>>>> Bill
>>>
>>> Yes it doesn't look like it removes the endpoint - I guess it's a
>>> corner case that's missed. I'll open an issue.
>>>
>>> BTW all these things are very quick and easy to try with Python on
>>> Linux. Just install pyzmq, open a python3 terminal and:
>>>
>>> import zmq
>>> ctx = zmq.Context.instance()
>>> rep = ctx.socket(zmq.REP)
>>> rep.bind("tcp://127.0.0.1:12345")
>>> req = ctx.socket(zmq.REQ)
>>> req.connect("tcp://127.0.0.1:12345")
>>> req.send_string("hello")
>>> rep.recv()
>>> rep.send_string("hallo")
>>> req.recv()
>>> rep.unbind("tcp://127.0.0.1:12345")
>>> rep.close()
>>> rep = ctx.socket(zmq.REP)
>>> rep.bind("tcp://127.0.0.1:12345")
>>> req.send_string("hello")
>>> rep.recv()
>>> rep.send_string("hallo")
>>> req.recv()
>>> rep.unbind("tcp://127.0.0.1:12345")
>>> rep.close()
>>> req.close()
>>> rep = ctx.socket(zmq.REP)
>>> rep.bind("tcp://127.0.0.1:12345")
>>> req = ctx.socket(zmq.REQ)
>>> req.setsockopt(zmq.RECONNECT_IVL,
>>> -1)req.connect("tcp://127.0.0.1:12345")
>>> req.send_string("hello")
>>> rep.recv()
>>> rep.send_string("hallo")
>>> req.recv()
>>> rep.unbind("tcp://127.0.0.1:12345")
>>> rep.close()
>>> rep = ctx.socket(zmq.REP)
>>> rep.bind("tcp://127.0.0.1:12345")
>>> req.send_string("hello")
>>> rep.recv()
>>>
>>> This last one won't receive the message
>>>
>>>>> On Sep 1, 2017, at 6:19 PM, Luca Boccassi <luca.boccassi at gmail.
>>>>> com>
>>>>> wrote:
>>>>>
>>>>> On Fri, 2017-09-01 at 18:03 -0400, Bill Torpey wrote:
>>>>>> Thanks Luca! That was very helpful.
>>>>>>
>>>>>> Although it leads to a couple of other questions:
>>>>>>
>>>>>> - Can I assume that a ZMQ disconnect of a tcp endpoint would
>>>>>> only
>>>>>> occur if the underlying TCP socket is closed by the OS? Or
>>>>>> are
>>>>>> there
>>>>>> conditions in which ZMQ will proactively disconnect the TCP
>>>>>> socket
>>>>>> and try to reconnect?
>>>>>
>>>>> Normally that's the case - you can set up heartbeating with the
>>>>> appropriate options and that will kill a connection if there's
>>>>> no
>>>>> answer
>>>>>
>>>>>> - I see that there is a sockopt (ZMQ_RECONNECT_IVL) that can
>>>>>> be
>>>>>> set
>>>>>> to -1 to disable reconnection entirely. In my case, the the
>>>>>> “data”
>>>>>> socket pair will *always* connect to an ephemeral port, so I
>>>>>> *never*
>>>>>> want to reconnect. Would this be a reasonable option in my
>>>>>> case,
>>>>>> do
>>>>>> you think?
>>>>>
>>>>> If that makes sense for your application, go for it - in these
>>>>> cases
>>>>> the only way to be sure is to test it and see how it works
>>>>>
>>>>>> - Would there be any interest in a patch that would disable
>>>>>> reconnects (controlled by sockopt) for ephemeral ports
>>>>>> only? I’m
>>>>>> guessing that reconnecting mostly makes sense with well-known
>>>>>> ports,
>>>>>> so something like this may be of general interest?
>>>>>
>>>>> If by ephemeral port you mean anything over 1024, then actually
>>>>> in
>>>>> most
>>>>> applications I've seen it's always useful to reconnect, and the
>>>>> existing option should be enough for those cases where it's not
>>>>> desired
>>>>> - we don't want to duplicate functionality
>>>>>
>>>>>> Thanks again!
>>>>>>
>>>>>> Bill
>>>>>>
>>>>>>> On Sep 1, 2017, at 5:30 PM, Luca Boccassi <luca.boccassi at gm
>>>>>>> ail.
>>>>>>> com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> On Fri, 2017-09-01 at 16:59 -0400, Bill Torpey wrote:
>>>>>>>> I'm curious about how ZMQ handles re-connection. I
>>>>>>>> understand
>>>>>>>> that
>>>>>>>> re-connection is supposed to happen "automagically" under
>>>>>>>> the
>>>>>>>> covers,
>>>>>>>> but that poses an interesting question.
>>>>>>>>
>>>>>>>> To make a long story short, the application I'm working
>>>>>>>> on
>>>>>>>> uses
>>>>>>>> pub/sub sockets over TCP. and works like follows:
>>>>>>>>
>>>>>>>> At startup:
>>>>>>>> 1. connects to a proxy/broker at a well-known address,
>>>>>>>> using
>>>>>>>> a
>>>>>>>> pub/sub socket pair ("discovery");
>>>>>>>> 2. subscribes to a well-known topic using the
>>>>>>>> "discovery"
>>>>>>>> sub
>>>>>>>> socket;
>>>>>>>> 3. binds a different pub/sub socket pair ("data") and
>>>>>>>> retrieves
>>>>>>>> the
>>>>>>>> actual endpoints assigned;
>>>>>>>> 4. publishes the "data" endpoints from step 3 on the
>>>>>>>> "discovery"
>>>>>>>> pub
>>>>>>>> socket;
>>>>>>>>
>>>>>>>> When the application receives a message on the
>>>>>>>> "discovery"
>>>>>>>> sub
>>>>>>>> socket, it connects the "data" socket pair to the
>>>>>>>> endpoints
>>>>>>>> specified
>>>>>>>> in the "discovery" message.
>>>>>>>>
>>>>>>>> So far, this seems to be working relatively well, and
>>>>>>>> allows
>>>>>>>> the
>>>>>>>> high-volume, low-latency "data" messages to be
>>>>>>>> sent/received
>>>>>>>> directly
>>>>>>>> between peers, avoiding the extra hop caused by a
>>>>>>>> proxy/broker
>>>>>>>> connection. The discovery messages use the proxy/broker,
>>>>>>>> but
>>>>>>>> since
>>>>>>>> these are (very) low-volume the extra hop doesn't
>>>>>>>> matter. The
>>>>>>>> use of
>>>>>>>> the proxy also eliminates the "slow joiner" problem that
>>>>>>>> can
>>>>>>>> happen
>>>>>>>> with other configurations.
>>>>>>>>
>>>>>>>> My question is what happens when one of the "data" peer
>>>>>>>> sockets
>>>>>>>> disconnects. Since ZMQ (apparently) keeps trying to
>>>>>>>> reconnect,
>>>>>>>> what
>>>>>>>> would prevent another process from binding to the same
>>>>>>>> ephemeral
>>>>>>>> port?
>>>>>>>>
>>>>>>>> - Can I assume that if the new application at that port
>>>>>>>> is
>>>>>>>> not a
>>>>>>>> ZMQ
>>>>>>>> application, that the reconnect will (silently) fail, and
>>>>>>>> continue to
>>>>>>>> be retried?
>>>>>>>
>>>>>>> The ZMTP handshake would fail, so yes.
>>>>>>>
>>>>>>>> - What if the new application at that port *IS* a ZMQ
>>>>>>>> application? Would the reconnect succeed? And if so,
>>>>>>>> what
>>>>>>>> would
>>>>>>>> happen if it's a *DIFFERENT* ZMQ application, and the
>>>>>>>> messages
>>>>>>>> that
>>>>>>>> it's sending/receiving don't match what the original
>>>>>>>> application
>>>>>>>> expects?
>>>>>>>
>>>>>>> Depends on how you handle it in your application. If you
>>>>>>> have
>>>>>>> security
>>>>>>> concerns, then use CURVE with authentication so that only
>>>>>>> authorised
>>>>>>> peers can connect.
>>>>>>>
>>>>>>>> It's reasonable for the application to publish a
>>>>>>>> disconnect
>>>>>>>> message
>>>>>>>> when it terminates normally, and the connected peers can
>>>>>>>> disconnect
>>>>>>>> that endpoint. But, applications don't always terminate
>>>>>>>> normally
>>>>>>>> ;-)
>>>>>>>
>>>>>>> That's a common pattern. But the application needs to
>>>>>>> handle
>>>>>>> unexpected
>>>>>>> data somewhat gracefully. What that means is entirely up to
>>>>>>> the
>>>>>>> application - as far as the library is concerned, if the
>>>>>>> handshake
>>>>>>> succeeds then it's all good (hence the use case for CURVE).
>>>>>>>
>>>>>>>> Any guidance, hints or tips would be much appreciated --
>>>>>>>> thanks
>>>>>>>> in
>>>>>>>> advance!
>>>>>>>
>>>>>>> --
>>>>>>> Kind regards,
>>>>>>> Luca
>>>>>>> Boccassi_______________________________________________
>>>>>>> zeromq-dev mailing list
>>>>>>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq
>>>>>>> .org
>>>>>>>> <mailto:zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at li
>>>>>>>> sts.
>>>>>>>
>>>>>>> zeromq.org>>
>>>>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>>>>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> zeromq-dev mailing list
>>>>>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.o
>>>>>> rg>
>>>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>>>>>
>>>>> --
>>>>> Kind regards,
>>>>> Luca Boccassi_______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org
>>>>>>
>>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>> <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
>>>>
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>> --
>>> Kind regards,
>>> Luca Boccassi_______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>_______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20170917/c3d0ba0d/attachment.htm>
More information about the zeromq-dev
mailing list