[zeromq-dev] lost message due to EINTR

Pieter Hintjens ph at imatix.com
Tue Jan 27 10:30:02 CET 2015


In CZMQ v3 the context is hidden and zctx is deprecated.

However if you use the old CZMQ v2 API, or you use the libzmq API
directly, you should be able to use multiple contexts safely.

If you've a short example that shows the problem, we can investigate.

-Pieter

On Tue, Jan 27, 2015 at 10:05 AM,  <sven.koebnick at t-online.de> wrote:
> Hi *!
>
>
>
> I found the "bug" that I >>>thought<<< to be caused by EINTR:
>
> The EINTR-thingy was gone, when I fixed some illegal refference usages, that
> the gcc compiler didn't complain about (but "clang" of LLVM did ;o)
>
> ... the lost messages problem was still there.
>
>
>
> BUT: when I switched from ZMQ2 to ZMQ4, I used the same logic for socket and
> context creation.
>
> I allways used one context for each service type for being able to tear down
> a service or several services of the same type by killing sockets or
> contexts when the service should happen to be stuck (just as an option).
>
> In ZMQ4 I found the context creation to be hidden behind socket creation and
> used old functions to still stick to different contexts for different
> service types and for clients.
>
> So during initialization, I used a context for the client side, that gets
> destroyed when everything is started properly.
>
> This seems to cause the loss of message for the first few hundered messages
> ... sounds queer? Indeed!
>
> With different contexts (and the "one for startup phase" being destroyed) I
> lost about 1 of 10 messages. When continuing to run, the losses got fewer
> and fewer and where gone, when about 600-1000 messages where send.
>
> During the search of this bug, I placed sleep(1) after each send() for
> giving ZMQ the chance to get it on the wire before some (unknown) bug
> destroyed the data before being sent, but with no success: the messages were
> still lost. What is really queer and brought me on the right way to solve
> the bug is, that when placing the sleep(1) BEFORE the send() calls, the
> messages were NOT lost anymore, all could be received.
>
>
>
> As soon as I switched the logic to allways use the SAME CONTEXT and never
> destroy it, the message loosing was solved.
>
>
>
> Question: is it "forbidden" to use different contexts now in ZMQ Version 4?
>
> I'm thinking about writing a short demo when there's time and hope, that
> this quirk shows up.  But I wanted to inform you about this behaviour of
> loosing messages when contexts are destroyed during run (yes, I disconnected
> and destroyed the involved sockets before destroying the context, so it's
> possible, that disconnection is the cause instead of context destruction).
>
> Used socket types were: REQ/(X)REP and SUB/PUB
>
>
>
> Am 2015-01-09 14:47, schrieb sven.koebnick at t-online.de:
>
> I now use some code doublicate of your cmzq code, that does frame send()ing
> with REUSE and retries in case of  EINTR:
>
> I copied zmsg_recv() and wrapped the frame receiving in a loop checking
> EINTR.
>
>   zmsg_t *zmsg_recv (void *source)
>     {
>         assert (source);
>         zmsg_t *self = zmsg_new ();
>         if (!self)
>             return NULL;
>         void *handle = zsock_resolve (source);
>         while (true) {
>             zframe_t *frame = ZMQ_TEMP_FAILURE_RETRY_F(zframe_recv
> (handle));
>             if (!frame) {
>                 zmsg_destroy (&self);
>                 logFatal("data loss while receiving frame");
>                 break; // Interrupted or terminated
>             }
>             if (zmsg_append (self, &frame)) {
>                 zmsg_destroy (&self);
>                 logFatal("data loss while appending frame");
>                 break;
>             }
>             if (!zsock_rcvmore (handle))
>                 break; // Last message frame
>         }
>         return self;
>     }
> with ZMQ_TEMP_FAILURE_RETRY_F being a short macro that handles EINTR, EAGAIN
> and throw()ing with errno==EINTR or EAGAIN
>
> likewise with zmsg_send(), which had to be modified deeper because I cannot
> look into zmsg_t and so used zframe_send() and zmsg_next():
>
> int zmsg_send (zmsg_t **self_p, void *dest)
>     {
>         assert (self_p);
>         assert (dest);
>         zmsg_t *self = *self_p;
>         int rc = 0;
>         void *handle = zsock_resolve (dest);
>         if (self) {
>             assert (zmsg_is (self));
>             zframe_t *frame = zmsg_first (self);
>             while (frame) {
>                 zframe_t *next_frame=zmsg_next(self);
>                 rc = ZMQ_TEMP_FAILURE_RETRY(zframe_send (&frame, handle,
>                         next_frame ? ZFRAME_MORE+ZFRAME_REUSE :
> ZFRAME_REUSE));
>                 if (rc != 0) {
>                     logFatal("data loss while sending frame");
>                     break;
>                 }
>                 frame = next_frame;
>             }
>             zmsg_destroy (self_p);
>         }
>         return rc;
>     }
>
>
>
> That made the behaviour MUCH better ;o))) but not fully functional.
>
> I now loose messages only in Dispatcher Service until all (in testing
> environment) five Dispatchers are "dead" (waiting for answers that won't
> come ... yes, I have harder code that keeps Dispatchers working in such
> cases, but that code is currently disabled for easier testing).
>
>
>
> For the moment, I assume, the rest to be a bug inside my Dispatcher and
> check, which message configurations cause the vanishing ... this definitely
> does not happen for all types, so THIS must be a bug inside the application.
>
> I'll inform you as soon as I have a hint (or prove) that there is a bug on
> ZMQ side ;o)
>
>
>
> In any case, I'll go deeper in checking when and why I get EINTRs ... maybe
> some smart pointer (boost) is missused by me and deletes messages (at the
> end of a block) that are still waiting for delivery by ZMQ in another
> thread.
>
>
>
>
>
> Am 2015-01-09 13:37, schrieb Pieter Hintjens:
>
> On Fri, Jan 9, 2015 at 1:25 PM,  <sven.koebnick at t-online.de> wrote:
>
> I get that error only during debuging inside Eclipse C++ (gdb).
>
> Makes sense. The debugger is sending interrupt signals. It's going to
> make a mess of any logic that uses them. I don't think you can make
> the code robust against this, nor would it be a good idea to make the
> code more complex just so it will work under a debugger.
>
> If you know, what else causes EINTRs beside Ctrl-C and likewise, just tell
> me ... I just don't know. Using ZMQ2, I NEVER had EINTR, even if single
> stepping the application.
>
> Hmm. A lot changed from ZMQ v2 to v4. Also, CZMQ is doing default
> signal handling that you might want to modify (it has hooks so you can
> switch it off).
>
> Didn't you ever happen to get EINTR in your own testing?
>
> I personally don't use debuggers except to read core dumps. Stepping
> through code is insanely pointless in a multithreaded app. It's better
> IME to use tracing and even simply printf statements.
>
> See if it happens outside Eclipse, then we can try to find the source
> of the signals.
>
> -Pieter
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
>
>
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>



More information about the zeromq-dev mailing list