[zeromq-dev] Pipeline Reliability

Charles Bouillaguet charles.bouillaguet at gmail.com
Tue Jul 3 16:08:05 CEST 2018


Dear zeromq'ers,

I'm facing a reliability problem that I couldn't solve by myself so far.

I have two machines with two asymetric programs running. Machine A creates a
PULL socket and binds it. Machine B creates a PUSH socket and connects it (to
the PULL socket of machine A), using the TCP transport. Machine B then sends
messages like crazy (about 500/s). Basically, B is an low-cost device equiped
with sensors and A is a server that just stores the data.

This works like a charm... until the inevitable happens: some network event
occurs, and the messages cannot be transmitted from machine B to machine A.

With a blocking send, the process on machine B then gets stuck in zmq_send(),
once the high water mark is reached, and the whole pipeline grinds to a halt.

To avoid this, I tried the "Lazy Pirate Pattern". I use something like:

   if (-1 == zmq_send(socket, msg, size, ZMQ_DONTWAIT)) {
      if (errno == EAGAIN) {
      	 zmq_close(socket);
	 socket = zmq_socket(context, ZMQ_PUSH);
	 zmq_connect(socket, address);
      }
   }

I don't care if I lose some messages. What I don't want is the pipeline to stop
forever.

At first, this seems to work as intended. When the network is down, the program
actually closes and re-creates the socket; the call to zmq_connect()
succeeds... but the messages are still not sent, and the process in machine B
ends up in a loop where it fills the ZMQ buffers, destroy the socket, re-create
it, re-connect, rinse, repeat. I observed the loop for several hours.

Just stopping the UNIX process and re-starting it solved the problem
(i.e. messages get transmitted normally, instantaneously).

Is there something I am doing wrong? What are my options to avoid this problem?
[I can consider moving away from ZMQ to nanomsg or nng].

Thanks,
-- 
Charles BOUILLAGUET
Université de Lille - Sciences et Technologies
charles.bouillaguet at univ-lille1.fr | www.univ-lille1.fr
Laboratoire CRIStAL - Bât M3 - Bureau 332 - 59655 Villeneuve d'Ascq
Tél. +33 (0)3 28 77 85 84
homepage:  http://cristal.univ-lille.fr/~bouillag/


More information about the zeromq-dev mailing list