[zeromq-dev] Improving zeromq in OOM conditions

Ilja Golshtein ilejncs at narod.ru
Tue May 17 11:22:16 CEST 2011


Let me please express my opinion.

I think the best thing to do in case of OOM - raise exception. Unfortunately I don't know how to do it since ZeroMQ has C API.
Rationale behind this - in case of OOM the process (not entire application!) must be restarted with distinctive logging,
since OOM condition is most likely caused by a design or programming error.

Next option is do nothing in production and crash in debug build. It is actually what we have now.

The worst thing to do is to try to handle OOM condition spreading "==0" checks among the code
and losing performance of this.
Rationale behind this - in modern 64-bit world process level OOM is rare. If it is hit chances the process
can go on normally could be neglected.

On the other hand, I realise
- there are environments without OS and without OS OOM killer
- there are 32-bit (and probably 16-bit) devices.
In these cases graceful  OOM handling might be useful.

Thanks.

17.05.2011, 12:26, "Martin Sustrik" <sustrik at 250bpm.com>:
> Hi Paul,
>
>>  Done.
>
> Ok. Applied to master. Thanks!
>
>>  Can you give me a better advice. Do you think connection shouldn't be
>>  closed on OOM first, or is it better to leave that intact and refactor
>>  reconnection code to not crash in the first place? BTW, current
>>  connection closing code is helpful for debuging latter.
>
> What I personally believe is that the library should not get to OOM
> condition in the first place. Possible steps to get that would include:
>
> 1. Use finite default HWM.
> 2. Use finite default MAXMSGSIZE.
> 3. Implement a MAXCONNECTIONS option with finite default.
> 4. Think hard about whether it makes sense to allow infinite as an valid
> option for any of the above.
>
> However, that's my personal opinion. If you still feel like you should
> handle the OOM situation when it hits, feel free to try. However, don't
> do any guesswork. With OOM handling the guesses mostly turn out to be
> false. Make a test instead and fix the problem you'll get.
>
> Also keep in mind that your sophisticated OOM handling is likely to be
> spoiled by OS OOM killer hitting in and killing the whole process.
>
> Martin
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev

-- 
Best regards,
Ilja Golshtein.



More information about the zeromq-dev mailing list