[zeromq-dev] [PATCH] Fixed OOM handling while writing to a pipe
paul at colomiets.name
Fri May 20 23:00:58 CEST 2011
> Hi Martin,
On Fri, May 20, 2011 at 12:57 PM, Martin Sustrik <sustrik at 250bpm.com> wrote:
> Hi Paul,
> In case of OOM zmq_send will return EAGAIN, because there no way to
>> return other error from pipe (writer_t). Do you think it's worth fixing?
>> basically means writer_t::write should return 0 and set errno instead
>> of returning bool. (Or should it return ENOMEM directly?)
> Yes. Definitely. If error is to be returned it should be correct error.
> As for returning the error the most common POSIX practice is to return
> negative number and set errno. Return value of 0 is used for success.
Yup, but citing http://www.unix.org/whitepapers/reentrant.html:
In addition, all POSIX.1c functions avoid using errno and, instead, return
the error number directly as the function return value, with a return value
of zero indicating that no error was detected. This strategy is, in fact,
being followed on a POSIX-wide basis for all new functions.
but, nevermind, it should be consistent with other functions
> Sometimes write fail even if `more` is set on previous message, this
>> will trigger some assertions in zeromq code (they will be fixed in the
>> future patches),
> What assertions? If send fails we should revert the state to what it was
> before send() was called. That shouldn't trigger any assert.
your code expects pipe->write never fail after first message. I would
improve that, you just said you want small incremental steps, and this patch
really fixed crash in one of my test scripts.
> and probably can trigger assertions in user code.
> Same as above.
Well, I usually do something like:
rc = zmq_send(&msg, ZMQ_SNDMORE);
if(rc) return false;
rc = zmq_send(&msg2, 0);
Because previously sending all subsequent message parts could never fail.
> One additional issue: If you change function prototype to return error, you
> should check the error at each place the function is called from.
Sure. Am I missed something?
> But as currently nobody expects zeromq to work under OOM,
>> it's probably fine to live with this for some time.
> There's one important point to be made: 0MQ currently behaves 100%
> predictably in OOM condition -- it terminates the process. User is then free
> to restart the process or take whatever emergency measures are necessary.
> Any patches to OOM handling should preseve this 100% predictability.
> zmq_send() can return ENOMEM instead of terminating the process, however, it
> must do so consistently. Introducing undefined behaviour under OOM
> conditions is not an option.
Sure. Termination in lb.cpp:118 will be quite unfortunate, but I've said
it's single step, and I think it's ok for some development version.
All in all, I'll fix the patch, if we decide it's needed at all (discussion
is in another thread).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the zeromq-dev