[zeromq-dev] zeromq protocol_error handling
Bill Torpey
wallstprog at gmail.com
Thu May 20 17:52:18 CEST 2021
Sorry — meant to get back to you sooner, but it’s been a crazy week.
You don’t say what version you’re running, but there have been some changes in that area not that long ago — check these out and see if they help:
https://github.com/zeromq/libzmq/pull/3831
https://github.com/zeromq/libzmq/pull/3960
https://github.com/zeromq/libzmq/pull/4053
Good luck.
Bill
> On May 20, 2021, at 10:26 AM, James Harvey <jamesdillonharvey at gmail.com> wrote:
>
> Hi,
>
> I will try and simplify my previous long email.
>
> If a stream gets into a protocol error state (e.g tcp SUB connect to REQ)
>
> Should the information (connection is terminated) be passed somehow back to the parent socket so if connect() is called again it attempts to connect rather than a no-op.
>
> OR
>
> Should we add a protocol error event to socket monitor so the calling process can handle it by calling disconnect/connect
>
> Just want some clarification so I work on the correct code.
>
> Thanks
>
> James
>
> On Thu, May 13, 2021 at 4:48 PM James Harvey <jamesdillonharvey at gmail.com <mailto:jamesdillonharvey at gmail.com>> wrote:
> Hi,
>
> I have a rare/random bug that causes my ZMQ_SUB socket to fail for a certain endpoint with no way to track/notify. Yes it's because a SUB connects to a REQ socket but once you start to use zeromq for lots of transient systems in a large company this kind of thing will happen occasionally.
>
> The process happens like this:
>
> - ZMQ_PUB binds on 1.2.3.4:44444 <http://1.2.3.4:44444/> (ephemeral)
> - ZMQ_SUB connects to 1.2.3.4:44444 <http://1.2.3.4:44444/> (data flows)
> - ZMQ_PUB goes down
> - Unrelated process (ZMQ_REQ) comes up and grabs the same 1.2.3.4:44444 <http://1.2.3.4:44444/> as its ephemeral
> - ZMQ_SUB has not yet been told to disconnect so it reconnects to the ZMQ_REQ
> - protocol error happens and the connection is terminated in the session/engine
> - Now a good ZMQ_PUB comes up and binds on 1.2.3.4:44444 <http://1.2.3.4:44444/>
> - ZMQ_SUB gets new instruction to connect()
> - connect() just returns noop.
> - The socket_base thinks it still has a valid endpoint and SUB only connects once to each endpoint.
> - At this point there are no errors and no data flowing.
>
> My question is, should the protocol_error in the session propagate up to remove the endpoint from the socket?
>
> If yes I can look at adding that, if no do you have any suggestions?
>
> Thanks for your time
>
> James
>
> Some links to the code:
>
> If socket is SUB and the endpoint is present dont connect.
> https://github.com/zeromq/libzmq/blob/master/src/socket_base.cpp#L901 <https://github.com/zeromq/libzmq/blob/master/src/socket_base.cpp#L901>
>
> terminate with no reconnect on protocol_error
> https://github.com/zeromq/libzmq/blob/master/src/session_base.cpp#L486 <https://github.com/zeromq/libzmq/blob/master/src/session_base.cpp#L486>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20210520/91bc955e/attachment.htm>
More information about the zeromq-dev
mailing list