[zeromq-dev] lost message due to EINTR

sven.koebnick at t-online.de sven.koebnick at t-online.de
Tue Jan 27 10:05:12 CET 2015


Hi *! 

I found the "bug" that I >>>thought<<< to be caused by

The EINTR-thingy was gone, when I fixed some illegal refference
usages, that the gcc compiler didn't complain about (but "clang" of LLVM
did ;o) 

... the lost messages problem was still there. 

BUT: when I
switched from ZMQ2 to ZMQ4, I used the same logic for socket and context

I allways used one context for each service type for being
able to tear down a service or several services of the same type by
killing sockets or contexts when the service should happen to be stuck
(just as an option). 

In ZMQ4 I found the context creation to be hidden
behind socket creation and used old functions to still stick to
different contexts for different service types and for clients. 

during initialization, I used a context for the client side, that gets
destroyed when everything is started properly. 

This seems to cause the
loss of message for the first few hundered messages ... sounds queer?

With different contexts (and the "one for startup phase" being
destroyed) I lost about 1 of 10 messages. When continuing to run, the
losses got fewer and fewer and where gone, when about 600-1000 messages
where send. 

During the search of this bug, I placed sleep(1) after
each send() for giving ZMQ the chance to get it on the wire before some
(unknown) bug destroyed the data before being sent, but with no success:
the messages were still lost. What is really queer and brought me on the
right way to solve the bug is, that when placing the sleep(1) BEFORE the
send() calls, the messages were NOT lost anymore, all could be received.

As soon as I switched the logic to allways use the SAME CONTEXT and
never destroy it, the message loosing was solved. 

Question: is it
"forbidden" to use different contexts now in ZMQ Version 4? 

thinking about writing a short demo when there's time and hope, that
this quirk shows up. But I wanted to inform you about this behaviour of
loosing messages when contexts are destroyed during run (yes, I
disconnected and destroyed the involved sockets before destroying the
context, so it's possible, that disconnection is the cause instead of
context destruction). 

Used socket types were: REQ/(X)REP and SUB/PUB

Am 2015-01-09 14:47, schrieb sven.koebnick at t-online.de: 

> I now use
some code doublicate of your cmzq code, that does frame send()ing with
REUSE and retries in case of EINTR: 
> I copied zmsg_recv() and
wrapped the frame receiving in a loop checking EINTR. 
> zmsg_t
*zmsg_recv (void *source)
> {
> assert (source);
> zmsg_t *self =
zmsg_new ();
> if (!self)
> return NULL;
> void *handle = zsock_resolve
> while (true) {
> zframe_t *frame =
ZMQ_TEMP_FAILURE_RETRY_F(zframe_recv (handle));
> if (!frame) {
zmsg_destroy (&self);
> logFatal("data loss while receiving frame");
break; // Interrupted or terminated
> }
> if (zmsg_append (self,
&frame)) {
> zmsg_destroy (&self);
> logFatal("data loss while appending
> break;
> }
> if (!zsock_rcvmore (handle))
> break; // Last
message frame
> }
> return self;
> }
being a short macro that handles EINTR, EAGAIN and throw()ing with
errno==EINTR or EAGAIN 
> likewise with zmsg_send(), which had to be
modified deeper because I cannot look into zmsg_t and so used
zframe_send() and zmsg_next(): 
> int zmsg_send (zmsg_t **self_p,
void *dest)
> {
> assert (self_p);
> assert (dest);
> zmsg_t *self =
> int rc = 0;
> void *handle = zsock_resolve (dest);
> if
(self) {
> assert (zmsg_is (self));
> zframe_t *frame = zmsg_first
> while (frame) {
> zframe_t *next_frame=zmsg_next(self);
> rc =
ZMQ_TEMP_FAILURE_RETRY(zframe_send (&frame, handle,
> next_frame ?
> if (rc != 0) {
logFatal("data loss while sending frame");
> break;
> }
> frame =
> }
> zmsg_destroy (self_p);
> }
> return rc;
> } 
MADE THE BEHAVIOUR MUCH BETTER ;O))) but not fully functional. 
> I
now loose messages only in Dispatcher Service until all (in testing
environment) five Dispatchers are "dead" (waiting for answers that won't
come ... yes, I have harder code that keeps Dispatchers working in such
cases, but that code is currently disabled for easier testing). 
check, which message configurations cause the vanishing ... this
definitely does not happen for all types, so THIS must be a bug inside
the application. 
> I'll inform you as soon as I have a hint (or
prove) that there is a bug on ZMQ side ;o) 
> In any case, I'll go
deeper in checking when and why I get EINTRs ... maybe some smart
pointer (boost) is missused by me and deletes messages (at the end of a
block) that are still waiting for delivery by ZMQ in another thread. 

> Am 2015-01-09 13:37, schrieb Pieter Hintjens: 
>> On Fri, Jan 9,
2015 at 1:25 PM, <sven.koebnick at t-online.de> wrote:
>>> I get that
error only during debuging inside Eclipse C++ (gdb).
>> Makes sense.
The debugger is sending interrupt signals. It's going to
>> make a mess
of any logic that uses them. I don't think you can make
>> the code
robust against this, nor would it be a good idea to make the
>> code
more complex just so it will work under a debugger.
>>> If you know,
what else causes EINTRs beside Ctrl-C and likewise, just tell me ... I
just don't know. Using ZMQ2, I NEVER had EINTR, even if single stepping
the application.
>> Hmm. A lot changed from ZMQ v2 to v4. Also, CZMQ
is doing default
>> signal handling that you might want to modify (it
has hooks so you can
>> switch it off).
> zeromq-dev mailing
> zeromq-dev at lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev [1]

[1] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20150127/a8920f77/attachment.htm>

More information about the zeromq-dev mailing list