[zeromq-dev] [PATCH] Fixed OOM handling while writing to a pipe

Paul Colomiets paul at colomiets.name
Fri May 20 23:00:58 CEST 2011


>
> Hi Martin,


On Fri, May 20, 2011 at 12:57 PM, Martin Sustrik <sustrik at 250bpm.com> wrote:

> Hi Paul,
>
>
>  In case of OOM zmq_send will return EAGAIN, because there no way to
>> return other error from pipe (writer_t). Do you think it's worth fixing?
>> This
>> basically means writer_t::write should return 0 and set errno instead
>> of returning bool. (Or should it return ENOMEM directly?)
>>
>
> Yes. Definitely. If error is to be returned it should be correct error.
>
Ok


>
> As for returning the error the most common POSIX practice is to return
> negative number and set errno. Return value of 0 is used for success.

Yup, but citing http://www.unix.org/whitepapers/reentrant.html:

  In addition, all POSIX.1c functions avoid using errno and, instead, return
the error number directly as the function return value, with a return value
of zero indicating that no error was detected. This strategy is, in fact,
being followed on a POSIX-wide basis for all new functions.

but, nevermind, it should be consistent with other functions


>
>  Sometimes write fail even if `more` is set on previous message, this
>> will trigger some assertions in zeromq code (they will be fixed in the
>> future patches),
>>
>
> What assertions? If send fails we should revert the state to what it was
> before send() was called. That shouldn't trigger any assert.


e.g. lb.cpp:118
your code expects pipe->write never fail after first message. I would
improve that, you just said you want small incremental steps, and this patch
really fixed crash in one of my test scripts.


>
>  and probably can trigger assertions in user code.
>>
>
> Same as above.
>

Well, I usually do something like:

rc = zmq_send(&msg, ZMQ_SNDMORE);
if(rc) return false;
rc = zmq_send(&msg2, 0);
assert(!rc);

Because previously sending all subsequent message parts could never fail.

>
> One additional issue: If you change function prototype to return error, you
> should check the error at each place the function is called from.


Sure. Am I missed something?


>  But as currently nobody expects zeromq to work under OOM,
>> it's probably fine to live with this for some time.
>>
>
> There's one important point to be made: 0MQ currently behaves 100%
> predictably in OOM condition -- it terminates the process. User is then free
> to restart the process or take whatever emergency measures are necessary.
>
> Any patches to OOM handling should preseve this 100% predictability.
> zmq_send() can return ENOMEM instead of terminating the process, however, it
> must do so consistently. Introducing undefined behaviour under OOM
> conditions is not an option.

Sure. Termination in lb.cpp:118 will be quite unfortunate, but I've said
it's single step, and I think it's ok for some development version.

All in all, I'll fix the patch, if we decide it's needed at all (discussion
is in another thread).

-- 
Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20110521/4102f427/attachment.htm>


More information about the zeromq-dev mailing list