[zeromq-dev] lost message due to EINTR
Thomas Rodgers
rodgert at twrodgers.com
Fri Jan 9 14:31:57 CET 2015
Just a guess, are any watchpoints set in the debugging session?
On Friday, January 9, 2015, <sven.koebnick at t-online.de> wrote:
> I get that error only during debuging inside Eclipse C++ (gdb).
>
> It seems irrelevant, if there are breakpoints set or not (but if there are
> any breakpoints, that stop the program, the number of EINTR is
> significantly higher).
>
> I also wonder about getting EINTR because of a debugger. Thet is no Ctrl-C
> or else involved, the system is just running in debug mode under gdb when
> every 10th message during
>
> - send()
>
> - recv() in dispatcher
>
> - resend()ing in dispatcher to final receipient
>
> - recv() in receipient thread (or also, if I configure it to be a separate
> process)
>
> - and the same way back.
>
> So counting the send()s and recv()s, every 80th to 100th message is
> involved in an EINTR when debugging.
>
> If you know, what else causes EINTRs beside Ctrl-C and likewise, just tell
> me ... I just don't know. Using ZMQ2, I NEVER had EINTR, even if single
> stepping the application.
>
>
>
> Didn't you ever happen to get EINTR in your own testing? Maybe I still
> have undiscovered memory violations, that cause disruption of ZMQ-data and
> -threads, but the system acts identically, when I configure each Service
> Worker to be it's own process instead of just pthread()ing. So I assume,
> there shouldn't be too many hidden SEGVs, that disturb ZMQ.
>
>
>
> Creating an easy example is difficult, because we are highly parallel with
> (in debugging) about 30 own threads and lots more for ZMQ. The whole thing
> has about 12K Lines now when counting the rudimentary base system without
> counting code for what the application is meant to do.
>
>
>
> Am 2015-01-09 13:06, schrieb Pieter Hintjens:
>
> In theory you cannot get an error between frames. Also, EINTR means
> the process is shutting down, so the benefit of retrying seems vague.
>
> Can you provide a reproducible test case? I do not like discussing
> abstracts we cannot actually disprove.
>
> On Fri, Jan 9, 2015 at 12:08 PM, <sven.koebnick at t-online.de <javascript:_e(%7B%7D,'cvml','sven.koebnick at t-online.de');>> wrote:
>
> the referenced fix should work for zmsg_send() but the same bug is in
> zmsg_recv() and there the application does not (yet) have any data.
> zmsg_recv() destroyes properly received frames 1 to n-1 if receiving frame
> n failes. Since the data is ONLY inside ZMQ at this time, the application
> cannot handle this and frames 1 to n-1 are gone finally. I think it would
> be good to either a) retry receiving a frame on EINTR inside ZMQ's code or
> b) offer a kind of "zmsg_recv_continue()" that appends further frames to a
> message after a (partial) failure shouldn't zmq's higher level APIs
> conpletely handle EINTR on atomar (frame) basis? Am 2015-01-09 10:53,
> schrieb Pieter Hintjens: I've fixed the error in zmsg and in zframe, see
> https://github.com/zeromq/czmq/pull/886 On Fri, Jan 9, 2015 at 10:46 AM,
> Pieter Hintjens <ph at imatix.com
> <javascript:_e(%7B%7D,'cvml','ph at imatix.com');>> wrote: Can you make a
> minimal reproducible test case? On Fri, Jan 9, 2015 at 9:41 AM, <
> sven.koebnick at t-online.de
> <javascript:_e(%7B%7D,'cvml','sven.koebnick at t-online.de');>> wrote:
> another related thing buthers me in this context: When zmsg_send() indeed
> returns with rc==-1 and a NULLed message (data is definitely lost) I have a
> chance to check for this lost message (simply asserting on rc==-1 &&
> message==NULL). BUT (!!) what is with zmsg_recv() ? Situation: I
> successfully zmsg_send() a message and is is delivered by ZMQ to the target
> (still inside ZMQ). When the same error occurs in the applications
> zmsg_recv() call, will the message be returned in the next call to
> zmsg_recv() or is is lost in this case also. Here, I do not have any chance
> to work with a copy as would be possible in the sending example below. This
> question destroys the first idea of holding a copy before sending. Indeed,
> I tryed, but zmsg_dup() destroyes routing information in the message, so it
> gets lost(silently) in transport failing to be adressed. Does
> zmq_msg_copy() work "better" and also copies the rotuing info? As an info:
> the EINTR thingy happens in Suse Linux on 32 and 64 bit using ZMQ4 (in any
> version) while debuging with Eclipse/gdb. I happens VERY often when
> breakpoints are triggered, but also (rarely) when the application is just
> running under gdb without any suspends due to breakpoints (breakpoints
> existing, but not hit). The system when pretty well for 2 years (!!!) under
> ZMQ2 and I have this problem only in ZMQ4 ... there has never been an EINTR
> under ZMQ2, so my code had to be heavily modified with loops for "temporary
> failures" with errno== EINTR or EAGAIN. ^5 Am 2015-01-09 08:30, schrieb
> sven.koebnick at t-online.de:
> <javascript:_e(%7B%7D,'cvml','sven.koebnick at t-online.de:');> Hi * ! I
> recently switched from ZMQ2 (pretty old) to ZMQ 4 and since then have some
> problems in debugging with EINTR. Following code: do { rc = zmsg_send
> (&zrep, clsocket_); if (rc<0) { if (errno == EINTR || errno == EAGAIN) {
> logWarn("temporary failure in zmq send() ... will be tried again."); } else
> { logFatal("hard error in sending zmq ... manually destroying message ...
> it will be lost"); zmsg_destroy(&zrep); } if (zrep) { logWarn("sending of
> reply msg returned rc("<<rc<<"), zmq_errno("<<zmq_errno()<<")
> "<<zmq_strerror(zmq_errno())); logWarn("but message is still existent ...
> retrying"); } else { logError("sending of reply msg returned rc("<<rc<<"),
> zmq_errno("<<zmq_errno()<<") "<<zmq_strerror(zmq_errno()));
> logFatal("message nulled anyway by zmq ... seems lost ..."); } } } while
> (zrep); // repeat until message is gone This snippet usualy works, but
> sometimes I get the warning of EINTR. No problem, I thought, but despite
> returning an error (rc==-1, errno==EINTR) the message pointer is NULLed, so
> I cannot resend the message. The Logs prove, that indeed the message is NOT
> sent, and for resending I'd need a copy ... what am I doing wrong? ^5 sven
> _______________________________________________ zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> <javascript:_e(%7B%7D,'cvml','zeromq-dev at lists.zeromq.org');>
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________ zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> <javascript:_e(%7B%7D,'cvml','zeromq-dev at lists.zeromq.org');>
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________ zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> <javascript:_e(%7B%7D,'cvml','zeromq-dev at lists.zeromq.org');>
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________ zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> <javascript:_e(%7B%7D,'cvml','zeromq-dev at lists.zeromq.org');>
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
> _______________________________________________
> zeromq-dev mailing listzeromq-dev at lists.zeromq.org <javascript:_e(%7B%7D,'cvml','zeromq-dev at lists.zeromq.org');>http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20150109/59675e02/attachment.htm>
More information about the zeromq-dev
mailing list