[zeromq-dev] Pipeline Reliability
Brian T. Carcich
briantcarcich at gmail.com
Thu Jul 5 15:00:19 CEST 2018
Is a PULL socket set up for one-time use, so it only allows one connection?
Since the server (Machine A) is expecting messages at a rate of 500/s, if
it goes for, say, 5s without data, then perhaps closing and recreating the
PULL socket would restart the flow. You say that restarting the server
fixes the problem; perhaps that restart works because it creates a new PULL
socket.
On Tue, Jul 3, 2018 at 2:06 PM Charles Bouillaguet <
charles.bouillaguet at gmail.com> wrote:
> On Tue, Jul 03, 2018 at 10:35:31AM -0500, Mark Botner wrote:
> > I wonder if the default setting for ZMQ_LINGER is causing the zmq_close()
> > to block since there are unsent messages? From the ref. guide:
> >
> > The default setting of *ZMQ_LINGER* does not discard unsent messages;
> this
> > behaviour may cause the application to block when calling
> *zmq_ctx_term()*.
>
> Indeed, but in my case, zmq_close() does NOT block. zmq_connect() does NOT
> block
> either. It's just that messages do not arrive, and eventually zmq_send()
> blocks
> (or fails in non-blocking mode).
>
> Charles
>
> > On Tue, Jul 3, 2018 at 9:08 AM, Charles Bouillaguet <
> > charles.bouillaguet at gmail.com> wrote:
> >
> > > Dear zeromq'ers,
> > >
> > > I'm facing a reliability problem that I couldn't solve by myself so
> far.
> > >
> > > I have two machines with two asymetric programs running. Machine A
> creates
> > > a
> > > PULL socket and binds it. Machine B creates a PUSH socket and connects
> it
> > > (to
> > > the PULL socket of machine A), using the TCP transport. Machine B then
> > > sends
> > > messages like crazy (about 500/s). Basically, B is an low-cost device
> > > equiped
> > > with sensors and A is a server that just stores the data.
> > >
> > > This works like a charm... until the inevitable happens: some network
> event
> > > occurs, and the messages cannot be transmitted from machine B to
> machine A.
> > >
> > > With a blocking send, the process on machine B then gets stuck in
> > > zmq_send(),
> > > once the high water mark is reached, and the whole pipeline grinds to a
> > > halt.
> > >
> > > To avoid this, I tried the "Lazy Pirate Pattern". I use something like:
> > >
> > > if (-1 == zmq_send(socket, msg, size, ZMQ_DONTWAIT)) {
> > > if (errno == EAGAIN) {
> > > zmq_close(socket);
> > > socket = zmq_socket(context, ZMQ_PUSH);
> > > zmq_connect(socket, address);
> > > }
> > > }
> > >
> > > I don't care if I lose some messages. What I don't want is the
> pipeline to
> > > stop
> > > forever.
> > >
> > > At first, this seems to work as intended. When the network is down, the
> > > program
> > > actually closes and re-creates the socket; the call to zmq_connect()
> > > succeeds... but the messages are still not sent, and the process in
> > > machine B
> > > ends up in a loop where it fills the ZMQ buffers, destroy the socket,
> > > re-create
> > > it, re-connect, rinse, repeat. I observed the loop for several hours.
> > >
> > > Just stopping the UNIX process and re-starting it solved the problem
> > > (i.e. messages get transmitted normally, instantaneously).
> > >
> > > Is there something I am doing wrong? What are my options to avoid this
> > > problem?
> > > [I can consider moving away from ZMQ to nanomsg or nng].
> > >
> > > Thanks,
> > > --
> > > Charles BOUILLAGUET
> > > Université de Lille - Sciences et Technologies
> > > charles.bouillaguet at univ-lille1.fr | www.univ-lille1.fr
> > > Laboratoire CRIStAL - Bât M3 - Bureau 332 - 59655 Villeneuve d'Ascq
> > > Tél. +33 (0)3 28 77 85 84
> > > homepage: http://cristal.univ-lille.fr/~bouillag/
> > > _______________________________________________
> > > zeromq-dev mailing list
> > > zeromq-dev at lists.zeromq.org
> > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > >
>
> > _______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev at lists.zeromq.org
> > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
> --
> Charles BOUILLAGUET
> Université de Lille - Sciences et Technologies
> charles.bouillaguet at univ-lille1.fr | www.univ-lille1.fr
> Laboratoire CRIStAL - Bât M3 - Bureau 332 - 59655 Villeneuve d'Ascq
> Tél. +33 (0)3 28 77 85 84
> homepage: http://cristal.univ-lille.fr/~bouillag/
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20180705/4795ea34/attachment.htm>
More information about the zeromq-dev
mailing list