[zeromq-dev] Pipeline Reliability

Charles Bouillaguet charles.bouillaguet at gmail.com
Tue Jul 3 20:02:14 CEST 2018


On Tue, Jul 03, 2018 at 10:35:31AM -0500, Mark Botner wrote:
> I wonder if the default setting for ZMQ_LINGER is causing the zmq_close()
> to block since there are unsent messages?  From the ref. guide:
> 
> The default setting of *ZMQ_LINGER* does not discard unsent messages; this
> behaviour may cause the application to block when calling *zmq_ctx_term()*.

Indeed, but in my case, zmq_close() does NOT block. zmq_connect() does NOT block
either. It's just that messages do not arrive, and eventually zmq_send() blocks
(or fails in non-blocking mode).

Charles

> On Tue, Jul 3, 2018 at 9:08 AM, Charles Bouillaguet <
> charles.bouillaguet at gmail.com> wrote:
> 
> > Dear zeromq'ers,
> >
> > I'm facing a reliability problem that I couldn't solve by myself so far.
> >
> > I have two machines with two asymetric programs running. Machine A creates
> > a
> > PULL socket and binds it. Machine B creates a PUSH socket and connects it
> > (to
> > the PULL socket of machine A), using the TCP transport. Machine B then
> > sends
> > messages like crazy (about 500/s). Basically, B is an low-cost device
> > equiped
> > with sensors and A is a server that just stores the data.
> >
> > This works like a charm... until the inevitable happens: some network event
> > occurs, and the messages cannot be transmitted from machine B to machine A.
> >
> > With a blocking send, the process on machine B then gets stuck in
> > zmq_send(),
> > once the high water mark is reached, and the whole pipeline grinds to a
> > halt.
> >
> > To avoid this, I tried the "Lazy Pirate Pattern". I use something like:
> >
> >    if (-1 == zmq_send(socket, msg, size, ZMQ_DONTWAIT)) {
> >       if (errno == EAGAIN) {
> >          zmq_close(socket);
> >          socket = zmq_socket(context, ZMQ_PUSH);
> >          zmq_connect(socket, address);
> >       }
> >    }
> >
> > I don't care if I lose some messages. What I don't want is the pipeline to
> > stop
> > forever.
> >
> > At first, this seems to work as intended. When the network is down, the
> > program
> > actually closes and re-creates the socket; the call to zmq_connect()
> > succeeds... but the messages are still not sent, and the process in
> > machine B
> > ends up in a loop where it fills the ZMQ buffers, destroy the socket,
> > re-create
> > it, re-connect, rinse, repeat. I observed the loop for several hours.
> >
> > Just stopping the UNIX process and re-starting it solved the problem
> > (i.e. messages get transmitted normally, instantaneously).
> >
> > Is there something I am doing wrong? What are my options to avoid this
> > problem?
> > [I can consider moving away from ZMQ to nanomsg or nng].
> >
> > Thanks,
> > --
> > Charles BOUILLAGUET
> > Université de Lille - Sciences et Technologies
> > charles.bouillaguet at univ-lille1.fr | www.univ-lille1.fr
> > Laboratoire CRIStAL - Bât M3 - Bureau 332 - 59655 Villeneuve d'Ascq
> > Tél. +33 (0)3 28 77 85 84
> > homepage:  http://cristal.univ-lille.fr/~bouillag/
> > _______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev at lists.zeromq.org
> > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >

> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev


-- 
Charles BOUILLAGUET
Université de Lille - Sciences et Technologies
charles.bouillaguet at univ-lille1.fr | www.univ-lille1.fr
Laboratoire CRIStAL - Bât M3 - Bureau 332 - 59655 Villeneuve d'Ascq
Tél. +33 (0)3 28 77 85 84
homepage:  http://cristal.univ-lille.fr/~bouillag/


More information about the zeromq-dev mailing list