[zeromq-dev] zeromq protocol_error handling

Bill Torpey wallstprog at gmail.com
Thu May 20 17:52:18 CEST 2021


Sorry — meant to get back to you sooner, but it’s been a crazy week.

You don’t say what version you’re running, but there have been some changes in that area not that long ago — check these out and see if they help:

https://github.com/zeromq/libzmq/pull/3831

https://github.com/zeromq/libzmq/pull/3960

https://github.com/zeromq/libzmq/pull/4053

Good luck.

Bill


> On May 20, 2021, at 10:26 AM, James Harvey <jamesdillonharvey at gmail.com> wrote:
> 
> Hi,
> 
> I will try and simplify my previous long email.
> 
> If a stream gets into a protocol error state  (e.g tcp SUB connect to REQ) 
> 
> Should the information (connection is terminated) be passed somehow back to the parent socket so if connect() is called again it attempts to connect rather than a no-op.
> 
> OR
> 
> Should we add a protocol error event to socket monitor so the calling process can handle it  by calling disconnect/connect
> 
> Just want some clarification so I work on the correct code.
> 
> Thanks
> 
> James
> 
> On Thu, May 13, 2021 at 4:48 PM James Harvey <jamesdillonharvey at gmail.com <mailto:jamesdillonharvey at gmail.com>> wrote:
> Hi,
> 
> I have a rare/random bug that causes my ZMQ_SUB socket to fail for a certain endpoint with no way to track/notify.  Yes it's because a SUB connects to a REQ socket but once you start to use zeromq for lots of transient systems in a large company this kind of thing will happen occasionally.
> 
> The process happens like this:
> 
>   - ZMQ_PUB binds on 1.2.3.4:44444 <http://1.2.3.4:44444/> (ephemeral)
>   - ZMQ_SUB connects to 1.2.3.4:44444 <http://1.2.3.4:44444/> (data flows)
>   - ZMQ_PUB goes down
>   - Unrelated process (ZMQ_REQ) comes up and grabs the same 1.2.3.4:44444 <http://1.2.3.4:44444/> as its ephemeral
>   - ZMQ_SUB has not yet been told to disconnect so it reconnects to the ZMQ_REQ
>   - protocol error happens and the connection is terminated in the session/engine
>   - Now a good ZMQ_PUB comes up and binds on 1.2.3.4:44444 <http://1.2.3.4:44444/>
>   - ZMQ_SUB gets new instruction to connect()
>   - connect() just returns noop.
>     - The socket_base thinks it still has a valid endpoint and SUB only connects once to each endpoint.
>   - At this point there are no errors and no data flowing.
> 
> My question is, should the protocol_error in the session propagate up to remove the endpoint from the socket?
> 
> If yes I can look at adding that, if no do you have any suggestions?
> 
> Thanks for your time
> 
> James
> 
> Some links to the code:
> 
> If socket is SUB and the endpoint is present dont connect.
> https://github.com/zeromq/libzmq/blob/master/src/socket_base.cpp#L901 <https://github.com/zeromq/libzmq/blob/master/src/socket_base.cpp#L901>
> 
> terminate with no reconnect on protocol_error 
> https://github.com/zeromq/libzmq/blob/master/src/session_base.cpp#L486 <https://github.com/zeromq/libzmq/blob/master/src/session_base.cpp#L486>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20210520/91bc955e/attachment.htm>


More information about the zeromq-dev mailing list