[zeromq-dev] Improving zeromq in OOM conditions

Martin Lucina mato at kotelna.sk
Tue May 17 13:29:50 CEST 2011


Hi,

sustrik at 250bpm.com said:
> Hi Paul,
> 
> > Done.
> 
> Ok. Applied to master. Thanks!
> 
> > Can you give me a better advice. Do you think connection shouldn't be
> > closed on OOM first, or is it better to leave that intact and refactor
> > reconnection code to not crash in the first place? BTW, current
> > connection closing code is helpful for debuging latter.
> 
> What I personally believe is that the library should not get to OOM 
> condition in the first place. Possible steps to get that would include:
> 
> 1. Use finite default HWM.
> 2. Use finite default MAXMSGSIZE.
> 3. Implement a MAXCONNECTIONS option with finite default.
> 4. Think hard about whether it makes sense to allow infinite as an valid 
> option for any of the above.

+1

> However, that's my personal opinion. If you still feel like you should 
> handle the OOM situation when it hits, feel free to try. However, don't 
> do any guesswork. With OOM handling the guesses mostly turn out to be 
> false. Make a test instead and fix the problem you'll get.
> 
> Also keep in mind that your sophisticated OOM handling is likely to be 
> spoiled by OS OOM killer hitting in and killing the whole process.

Coming rather late to this thread but I'm inclined to agree with Martin's
viewpoint. 0MQ should try to spend minimal effort on recovery; what we
should aim for is never crash, but if a call cannot allocate memory that
should propagate to the calling API which would return ENOMEM to the user.

Of course this is not viable for asynchronous work done behind the scenes;
in that case resource limits need to be set and imposed. There are few
applications (if any) that need to receive an arbitrary (4 Gb) message
size; likewise handle an arbitrary amount of connections.

Cheers,

-mato



More information about the zeromq-dev mailing list