[zeromq-dev] Should zeromq handle OOM? (Re: [PATCH] Fixed OOM handling while writing to a pipe)
paul at colomiets.name
Fri May 20 22:41:56 CEST 2011
I've changed subject to let others follow discussion easier.
On Fri, May 20, 2011 at 1:30 PM, Pieter Hintjens <ph at imatix.com> wrote:
> On Fri, May 20, 2011 at 12:21 PM, Martin Sustrik <sustrik at 250bpm.com>
> > There's one important point to be made: 0MQ currently behaves 100%
> > predictably in OOM condition -- it terminates the process. User is then
> > free to restart the process or take whatever emergency measures are
> > necessary.
> > Any patches to OOM handling should preseve this 100% predictability.
> > zmq_send() can return ENOMEM instead of terminating the process,
> > however, it must do so consistently. Introducing undefined behaviour
> > under OOM conditions is not an option.
> Sorry to say this rather late, but before we change the behavior of
> 0MQ under OOM conditions, I'd want the consensus of users here.
> It is a radical change in semantics to go from asserting, to
> continuing with an error response. We cannot make such changes without
> being certain there is a consensus of approval for them.
> My own experience goes strongly against handling OOM in any way except
> assertion. We explored this quite exhaustively in OpenAMQ and found
> that returning errors in case of OOM was very fragile. It is not even
> clear that an application can deal with such errors sanely, since many
> system calls will themselves fail if memory is exhausted. We tried
> hard to make this work, and in the end had to choose for "assert" as
> the only robust answer.
> It's particularly important for services because most of the time
> there is a problem that must be raised and resolved, whether it's the
> too-low default VM size, or the lack of HWMs on queues, or too-slow
> subscribers, etc.
> The only exception to assertion, afaics, is for allocation requests
> that are clearly unreasonable. And even then, assertion seems the
> right response if these requests are internal. If they're driven by
> user data (i.e. someone sending a 4GB message to a service), the
> correct response is detecting over-sized messages and discarding them
> (and we have this code in 2.2 and 3.0).
> tl,dr - +1 for asserting on OOM, -1 for returning ENOMEM.
The problem with asserting on OOM is that you excluding zeromq for using in
whole class of applications. All today's fast performance databases use
writeback cache. And it's totally bad for them to not to be able to flush
the dirty cache (well, its technically possible by installing handler on
SIGABRT, but is much less reliable). You exclude all kind of databases:
persistent queues, caches, whatever. Probably this is not the only kind of
applications is excluded, just something came to my mind.
So I'm -1 on asserting and +1 for ENOMEM
(but the situation that two core developers has exacly opposite opinion is
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the zeromq-dev