[zeromq-dev] [REP-REQ] Sockets timeouts and connection reset

DUMAS, Victor Victor.DUMAS at stago.com
Fri Apr 6 17:44:47 CEST 2018


Hello,

Context:
I have 2 peers, let’s call them:
  - PeerA under Linux using cppzmq version 4.1.3
  - PeerB under Windows using NetMQ version 4.0.0
I need a mailbox between the 2 of them with 3 reliability guarantees:

  *   Every message is delivered once and only once
  *   Every message sent by one peer is received by the other in the same order it was sent.
  *   The integrity of the message is untouched
Considering those constraints I chose to develop a “mailbox” using 1 REP and 1 REQ socket on each peer(2 sockets per peer). Each socket is used in a separate thread.
Each time a message is sent the receiver replies with a ackMessage to ensure delivery. This, with the REP/REQ state machine pattern, also guarantees order. And finally integrity should be the only guarantee made by ZMQ.
In order to not block forever each socket has sending and receiving timeouts. In case those timeouts are reached the sockets are destroyed and recreated.
To be clear I should mention that this “mailbox” runs on an environment with lots of threads on both machine (if that makes any difference).
Problem:
It works half the time. Meaning sometimes, every 30 seconds or so, both PeerA is sending and PeerB waiting and nothing gets transmitted so both sockets are destroyed and recreated. I have been trying to debug it using Wireshark but apart from the messages that are transmitted, I do not quite understand the inner workings of ZMQ and the messages that are exchanged in between.
Here is a Wireshark dump of the main Packets summary : https://pastebin.com/xRgWRWQ3
You can clearly see it disconnecting at 10:03:20 which would be expected if no packet was received (timeout in this example is 8 seconds) but attempts to reconnect fail several times until 10:03:35 where it finally reconnects and send a few more messages before failing again.
My questions would be:

  *   Can version difference between cppzmq and NetMQ create this kind of problem?
  *   Why this problem occurs even though one socket is listening and the other sending?
  *   Why is the problem present only when sending from PeerA to PeerB (Linux -> Windows) and not the other way around? (I have disabled firewall on both windows and Linux)
--
Victor Dumas





This message has been scanned for malware by Websense. www.websense.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20180406/9b516270/attachment.htm>


More information about the zeromq-dev mailing list