[zeromq-dev] Pipeline Reliability

Luca Boccassi luca.boccassi at gmail.com
Tue Jul 3 17:25:32 CEST 2018


On Tue, 2018-07-03 at 16:08 +0200, Charles Bouillaguet wrote:
> Dear zeromq'ers,
> 
> I'm facing a reliability problem that I couldn't solve by myself so
> far.
> 
> I have two machines with two asymetric programs running. Machine A
> creates a
> PULL socket and binds it. Machine B creates a PUSH socket and
> connects it (to
> the PULL socket of machine A), using the TCP transport. Machine B
> then sends
> messages like crazy (about 500/s). Basically, B is an low-cost device
> equiped
> with sensors and A is a server that just stores the data.
> 
> This works like a charm... until the inevitable happens: some network
> event
> occurs, and the messages cannot be transmitted from machine B to
> machine A.
> 
> With a blocking send, the process on machine B then gets stuck in
> zmq_send(),
> once the high water mark is reached, and the whole pipeline grinds to
> a halt.
> 
> To avoid this, I tried the "Lazy Pirate Pattern". I use something
> like:
> 
>    if (-1 == zmq_send(socket, msg, size, ZMQ_DONTWAIT)) {
>       if (errno == EAGAIN) {
>       	 zmq_close(socket);
> 	 socket = zmq_socket(context, ZMQ_PUSH);
> 	 zmq_connect(socket, address);
>       }
>    }
> 
> I don't care if I lose some messages. What I don't want is the
> pipeline to stop
> forever.
> 
> At first, this seems to work as intended. When the network is down,
> the program
> actually closes and re-creates the socket; the call to zmq_connect()
> succeeds... but the messages are still not sent, and the process in
> machine B
> ends up in a loop where it fills the ZMQ buffers, destroy the socket,
> re-create
> it, re-connect, rinse, repeat. I observed the loop for several hours.
> 
> Just stopping the UNIX process and re-starting it solved the problem
> (i.e. messages get transmitted normally, instantaneously).
> 
> Is there something I am doing wrong? What are my options to avoid
> this problem?
> [I can consider moving away from ZMQ to nanomsg or nng].
> 
> Thanks,

1) Don't close/reopen the socket, you can avoid blocking with DONTWAIT
by itself
2) Enable the heartbeat options for faster automated disconnects and
reconnects

-- 
Kind regards,
Luca Boccassi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20180703/b90760cf/attachment.sig>


More information about the zeromq-dev mailing list