[zeromq-dev] lost message due to EINTR
Pieter Hintjens
ph at imatix.com
Fri Jan 9 13:06:02 CET 2015
In theory you cannot get an error between frames. Also, EINTR means
the process is shutting down, so the benefit of retrying seems vague.
Can you provide a reproducible test case? I do not like discussing
abstracts we cannot actually disprove.
On Fri, Jan 9, 2015 at 12:08 PM, <sven.koebnick at t-online.de> wrote:
> the referenced fix should work for zmsg_send() but the same bug is in
> zmsg_recv() and there the application does not (yet) have any data.
>
> zmsg_recv() destroyes properly received frames 1 to n-1 if receiving frame n
> failes. Since the data is ONLY inside ZMQ at this time, the application
> cannot handle this and frames 1 to n-1 are gone finally.
>
> I think it would be good to either
>
> a) retry receiving a frame on EINTR inside ZMQ's code
>
> or
>
> b) offer a kind of "zmsg_recv_continue()" that appends further frames to a
> message after a (partial) failure
>
>
>
> shouldn't zmq's higher level APIs conpletely handle EINTR on atomar (frame)
> basis?
>
>
>
> Am 2015-01-09 10:53, schrieb Pieter Hintjens:
>
> I've fixed the error in zmsg and in zframe, see
> https://github.com/zeromq/czmq/pull/886
>
> On Fri, Jan 9, 2015 at 10:46 AM, Pieter Hintjens <ph at imatix.com> wrote:
>
> Can you make a minimal reproducible test case? On Fri, Jan 9, 2015 at 9:41
> AM, <sven.koebnick at t-online.de> wrote:
>
> another related thing buthers me in this context: When zmsg_send() indeed
> returns with rc==-1 and a NULLed message (data is definitely lost) I have a
> chance to check for this lost message (simply asserting on rc==-1 &&
> message==NULL). BUT (!!) what is with zmsg_recv() ? Situation: I
> successfully zmsg_send() a message and is is delivered by ZMQ to the target
> (still inside ZMQ). When the same error occurs in the applications
> zmsg_recv() call, will the message be returned in the next call to
> zmsg_recv() or is is lost in this case also. Here, I do not have any chance
> to work with a copy as would be possible in the sending example below. This
> question destroys the first idea of holding a copy before sending. Indeed, I
> tryed, but zmsg_dup() destroyes routing information in the message, so it
> gets lost(silently) in transport failing to be adressed. Does zmq_msg_copy()
> work "better" and also copies the rotuing info? As an info: the EINTR thingy
> happens in Suse Linux on 32 and 64 bit using ZMQ4 (in any version) while
> debuging with Eclipse/gdb. I happens VERY often when breakpoints are
> triggered, but also (rarely) when the application is just running under gdb
> without any suspends due to breakpoints (breakpoints existing, but not hit).
> The system when pretty well for 2 years (!!!) under ZMQ2 and I have this
> problem only in ZMQ4 ... there has never been an EINTR under ZMQ2, so my
> code had to be heavily modified with loops for "temporary failures" with
> errno== EINTR or EAGAIN. ^5 Am 2015-01-09 08:30, schrieb
> sven.koebnick at t-online.de: Hi * ! I recently switched from ZMQ2 (pretty old)
> to ZMQ 4 and since then have some problems in debugging with EINTR.
> Following code: do { rc = zmsg_send (&zrep, clsocket_); if (rc<0) { if
> (errno == EINTR || errno == EAGAIN) { logWarn("temporary failure in zmq
> send() ... will be tried again."); } else { logFatal("hard error in sending
> zmq ... manually destroying message ... it will be lost");
> zmsg_destroy(&zrep); } if (zrep) { logWarn("sending of reply msg returned
> rc("<<rc<<"), zmq_errno("<<zmq_errno()<<") "<<zmq_strerror(zmq_errno()));
> logWarn("but message is still existent ... retrying"); } else {
> logError("sending of reply msg returned rc("<<rc<<"),
> zmq_errno("<<zmq_errno()<<") "<<zmq_strerror(zmq_errno()));
> logFatal("message nulled anyway by zmq ... seems lost ..."); } } } while
> (zrep); // repeat until message is gone This snippet usualy works, but
> sometimes I get the warning of EINTR. No problem, I thought, but despite
> returning an error (rc==-1, errno==EINTR) the message pointer is NULLed, so
> I cannot resend the message. The Logs prove, that indeed the message is NOT
> sent, and for resending I'd need a copy ... what am I doing wrong? ^5 sven
> _______________________________________________ zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________ zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
More information about the zeromq-dev
mailing list