[zeromq-dev] zeromq protocol_error handling

James Harvey jamesdillonharvey at gmail.com
Thu May 13 17:48:51 CEST 2021


I have a rare/random bug that causes my ZMQ_SUB socket to fail for a
certain endpoint with no way to track/notify.  Yes it's because a SUB
connects to a REQ socket but once you start to use zeromq for lots of
transient systems in a large company this kind of thing will happen

The process happens like this:

  - ZMQ_PUB binds on (ephemeral)
  - ZMQ_SUB connects to (data flows)
  - ZMQ_PUB goes down
  - Unrelated process (ZMQ_REQ) comes up and grabs the same
as its ephemeral
  - ZMQ_SUB has not yet been told to disconnect so it reconnects to the
  - protocol error happens and the connection is terminated in the
  - Now a good ZMQ_PUB comes up and binds on
  - ZMQ_SUB gets new instruction to connect()
  - connect() just returns noop.
    - The socket_base thinks it still has a valid endpoint and SUB only
connects once to each endpoint.
  - At this point there are no errors and no data flowing.

My question is, should the protocol_error in the session propagate up to
remove the endpoint from the socket?

If yes I can look at adding that, if no do you have any suggestions?

Thanks for your time


Some links to the code:

If socket is SUB and the endpoint is present dont connect.

terminate with no reconnect on protocol_error
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20210513/0764c47c/attachment.htm>

More information about the zeromq-dev mailing list