[zeromq-dev] lost message due to EINTR

sven.koebnick at t-online.de sven.koebnick at t-online.de
Fri Jan 9 13:25:01 CET 2015


 

I get that error only during debuging inside Eclipse C++ (gdb). 

It
seems irrelevant, if there are breakpoints set or not (but if there are
any breakpoints, that stop the program, the number of EINTR is
significantly higher). 

I also wonder about getting EINTR because of a
debugger. Thet is no Ctrl-C or else involved, the system is just running
in debug mode under gdb when every 10th message during 

- send() 

-
recv() in dispatcher 

- resend()ing in dispatcher to final receipient


- recv() in receipient thread (or also, if I configure it to be a
separate process) 

- and the same way back. 

So counting the send()s
and recv()s, every 80th to 100th message is involved in an EINTR when
debugging. 

If you know, what else causes EINTRs beside Ctrl-C and
likewise, just tell me ... I just don't know. Using ZMQ2, I NEVER had
EINTR, even if single stepping the application. 

Didn't you ever happen
to get EINTR in your own testing? Maybe I still have undiscovered memory
violations, that cause disruption of ZMQ-data and -threads, but the
system acts identically, when I configure each Service Worker to be it's
own process instead of just pthread()ing. So I assume, there shouldn't
be too many hidden SEGVs, that disturb ZMQ. 

Creating an easy example
is difficult, because we are highly parallel with (in debugging) about
30 own threads and lots more for ZMQ. The whole thing has about 12K
Lines now when counting the rudimentary base system without counting
code for what the application is meant to do. 

Am 2015-01-09 13:06,
schrieb Pieter Hintjens: 

> In theory you cannot get an error between
frames. Also, EINTR means
> the process is shutting down, so the benefit
of retrying seems vague.
> 
> Can you provide a reproducible test case?
I do not like discussing
> abstracts we cannot actually disprove.
> 
>
On Fri, Jan 9, 2015 at 12:08 PM, <sven.koebnick at t-online.de> wrote:
>

>> the referenced fix should work for zmsg_send() but the same bug is
in zmsg_recv() and there the application does not (yet) have any data.
zmsg_recv() destroyes properly received frames 1 to n-1 if receiving
frame n failes. Since the data is ONLY inside ZMQ at this time, the
application cannot handle this and frames 1 to n-1 are gone finally. I
think it would be good to either a) retry receiving a frame on EINTR
inside ZMQ's code or b) offer a kind of "zmsg_recv_continue()" that
appends further frames to a message after a (partial) failure shouldn't
zmq's higher level APIs conpletely handle EINTR on atomar (frame) basis?
Am 2015-01-09 10:53, schrieb Pieter Hintjens: I've fixed the error in
zmsg and in zframe, see https://github.com/zeromq/czmq/pull/886 [1] On
Fri, Jan 9, 2015 at 10:46 AM, Pieter Hintjens <ph at imatix.com> wrote: Can
you make a minimal reproducible test case? On Fri, Jan 9, 2015 at 9:41
AM, <sven.koebnick at t-online.de> wrote: another related thing buthers me
in this context: When zmsg_send() indeed returns with rc==-1 and a
NULLed message (data is definitely lost) I have a chance to check for
this lost message (simply asserting on rc==-1 && message==NULL). BUT
(!!) what is with zmsg_recv() ? Situation: I successfully zmsg_send() a
message and is is delivered by ZMQ to the target (still inside ZMQ).
When the same error occurs in the applications zmsg_recv() call, will
the message be returned in the next call to zmsg_recv() or is is lost in
this case also. Here, I do not have any chance to work with a copy as
would be possible in the sending example below. This question destroys
the first idea of holding a copy before sending. Indeed, I tryed, but
zmsg_dup() destroyes routing information in the message, so it gets
lost(silently) in transport failing to be adressed. Does zmq_msg_copy()
work "better" and also copies the rotuing info? As an info: the EINTR
thingy happens in Suse Linux on 32 and 64 bit using ZMQ4 (in any
version) while debuging with Eclipse/gdb. I happens VERY often when
breakpoints are triggered, but also (rarely) when the application is
just running under gdb without any suspends due to breakpoints
(breakpoints existing, but not hit). The system when pretty well for 2
years (!!!) under ZMQ2 and I have this problem only in ZMQ4 ... there
has never been an EINTR under ZMQ2, so my code had to be heavily
modified with loops for "temporary failures" with errno== EINTR or
EAGAIN. ^5 Am 2015-01-09 08:30, schrieb sven.koebnick at t-online.de: Hi *
! I recently switched from ZMQ2 (pretty old) to ZMQ 4 and since then
have some problems in debugging with EINTR. Following code: do { rc =
zmsg_send (&zrep, clsocket_); if (rc<0) { if (errno == EINTR || errno ==
EAGAIN) { logWarn("temporary failure in zmq send() ... will be tried
again."); } else { logFatal("hard error in sending zmq ... manually
destroying message ... it will be lost"); zmsg_destroy(&zrep); } if
(zrep) { logWarn("sending of reply msg returned rc("<<rc<<"),
zmq_errno("<<zmq_errno()<<") "<<zmq_strerror(zmq_errno())); logWarn("but
message is still existent ... retrying"); } else { logError("sending of
reply msg returned rc("<<rc<<"), zmq_errno("<<zmq_errno()<<")
"<<zmq_strerror(zmq_errno())); logFatal("message nulled anyway by zmq
... seems lost ..."); } } } while (zrep); // repeat until message is
gone This snippet usualy works, but sometimes I get the warning of
EINTR. No problem, I thought, but despite returning an error (rc==-1,
errno==EINTR) the message pointer is NULLed, so I cannot resend the
message. The Logs prove, that indeed the message is NOT sent, and for
resending I'd need a copy ... what am I doing wrong? ^5 sven
_______________________________________________ zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev [2]
_______________________________________________ zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev [2]
_______________________________________________ zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev [2]
_______________________________________________ zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev [2]
> 
>
_______________________________________________
> zeromq-dev mailing
list
> zeromq-dev at lists.zeromq.org
>
http://lists.zeromq.org/mailman/listinfo/zeromq-dev [2]




Links:
------
[1] https://github.com/zeromq/czmq/pull/886
[2]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20150109/8cf7fc33/attachment.htm>


More information about the zeromq-dev mailing list