[zeromq-dev] Should zeromq handle OOM? (Re: [PATCH] Fixed OOM handling while writing to a pipe)

Steven McCoy steven.mccoy at miru.hk
Fri May 20 23:26:01 CEST 2011

On 20 May 2011 16:41, Paul Colomiets <paul at colomiets.name> wrote:

> Hi Pieter,
> I've changed subject to let others follow discussion easier.
> On Fri, May 20, 2011 at 1:30 PM, Pieter Hintjens <ph at imatix.com> wrote:
>> On Fri, May 20, 2011 at 12:21 PM, Martin Sustrik <sustrik at 250bpm.com>
>> wrote:
>> > There's one important point to be made: 0MQ currently behaves 100%
>> > predictably in OOM condition -- it terminates the process. User is then
>> > free to restart the process or take whatever emergency measures are
>> > necessary.
>> >
>> > Any patches to OOM handling should preseve this 100% predictability.
>> > zmq_send() can return ENOMEM instead of terminating the process,
>> > however, it must do so consistently. Introducing undefined behaviour
>> > under OOM conditions is not an option.
>> Sorry to say this rather late, but before we change the behavior of
>> 0MQ under OOM conditions, I'd want the consensus of users here.
>> It is a radical change in semantics to go from asserting, to
>> continuing with an error response. We cannot make such changes without
>> being certain there is a consensus of approval for them.
>> My own experience goes strongly against handling OOM in any way except
>> assertion. We explored this quite exhaustively in OpenAMQ and found
>> that returning errors in case of OOM was very fragile. It is not even
>> clear that an application can deal with such errors sanely, since many
>> system calls will themselves fail if memory is exhausted. We tried
>> hard to make this work, and in the end had to choose for "assert" as
>> the only robust answer.
>> It's particularly important for services because most of the time
>> there is a problem that must be raised and resolved, whether it's the
>> too-low default VM size, or the lack of HWMs on queues, or too-slow
>> subscribers, etc.
>> The only exception to assertion, afaics, is for allocation requests
>> that are clearly unreasonable. And even then, assertion seems the
>> right response if these requests are internal. If they're driven by
>> user data (i.e. someone sending a 4GB message to a service), the
>> correct response is detecting over-sized messages and discarding them
>> (and we have this code in 2.2 and 3.0).
>> tl,dr - +1 for asserting on OOM, -1 for returning ENOMEM.
> The problem with asserting on OOM is that you excluding zeromq for using in
> whole class of applications. All today's fast performance databases use
> writeback cache. And it's totally bad for them to not to be able to flush
> the dirty cache (well, its technically possible by installing handler on
> SIGABRT, but is much less reliable). You exclude all kind of databases:
> persistent queues, caches, whatever. Probably this is not the only kind of
> applications is excluded, just something came to my mind.
> So I'm -1 on asserting and +1 for ENOMEM
> (but the situation that two core developers has exacly opposite opinion is
> unfortunate)
Considering Linux's OOM default behaviour it isn't that much of an issue in
64-bit land.  The problem tends to arise though on 32-bit applications when
it can often be incredibly easy to hit 2GB, especially so on Windows which
doesn't have an overcommit feature.

OpenPGM asserts on OOM and also has no limits on peers which can cause
memory to rapidly increase with many active publishers.  It would be quite a
non-trivial project to implement a shared buffer across multiple receive
windows as you are effectively implementing variable sized windows and
trying to keep O(1) lookup for repairs.  Other implementations tend to use
list based windows without taking advantage of total memory management.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20110520/7e36e85a/attachment.htm>

More information about the zeromq-dev mailing list