[zeromq-dev] zeromq protocol_error handling

James Harvey jamesdillonharvey at gmail.com
Thu May 20 16:26:01 CEST 2021


Hi,

I will try and simplify my previous long email.

If a stream gets into a protocol error state  (e.g tcp SUB connect to REQ)

Should the information (connection is terminated) be passed somehow back to
the parent socket so if connect() is called again it attempts to connect
rather than a no-op.

OR

Should we add a protocol error event to socket monitor so the calling
process can handle it  by calling disconnect/connect

Just want some clarification so I work on the correct code.

Thanks

James

On Thu, May 13, 2021 at 4:48 PM James Harvey <jamesdillonharvey at gmail.com>
wrote:

> Hi,
>
> I have a rare/random bug that causes my ZMQ_SUB socket to fail for a
> certain endpoint with no way to track/notify.  Yes it's because a SUB
> connects to a REQ socket but once you start to use zeromq for lots of
> transient systems in a large company this kind of thing will happen
> occasionally.
>
> The process happens like this:
>
>   - ZMQ_PUB binds on 1.2.3.4:44444 (ephemeral)
>   - ZMQ_SUB connects to 1.2.3.4:44444 (data flows)
>   - ZMQ_PUB goes down
>   - Unrelated process (ZMQ_REQ) comes up and grabs the same 1.2.3.4:44444
> as its ephemeral
>   - ZMQ_SUB has not yet been told to disconnect so it reconnects to the
> ZMQ_REQ
>   - protocol error happens and the connection is terminated in the
> session/engine
>   - Now a good ZMQ_PUB comes up and binds on 1.2.3.4:44444
>   - ZMQ_SUB gets new instruction to connect()
>   - connect() just returns noop.
>     - The socket_base thinks it still has a valid endpoint and SUB only
> connects once to each endpoint.
>   - At this point there are no errors and no data flowing.
>
> My question is, should the protocol_error in the session propagate up to
> remove the endpoint from the socket?
>
> If yes I can look at adding that, if no do you have any suggestions?
>
> Thanks for your time
>
> James
>
> Some links to the code:
>
> If socket is SUB and the endpoint is present dont connect.
> https://github.com/zeromq/libzmq/blob/master/src/socket_base.cpp#L901
>
> terminate with no reconnect on protocol_error
> https://github.com/zeromq/libzmq/blob/master/src/session_base.cpp#L486
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20210520/85e96949/attachment.htm>


More information about the zeromq-dev mailing list