[zeromq-dev] Improving zeromq in OOM conditions

Martin Sustrik sustrik at 250bpm.com
Mon May 16 09:38:24 CEST 2011


Hi Paul,

> As Martin encouraged me to fix zeromq in out of memory conditions. Here
> are first patches and first questions.
>
> There are a lot of explicit and implicit (e.g. inserting in STL
> container) memory allocations in consturctors in zeromq code. As long as
> we are encouraged not to use exceptions in zeromq code, we can't
> gracefully propagate exceptions from there. So I see three options:
>
> 1. Refactor code to have all the memory allocations in `init()` method
> (other name?)
> 2. Allow throwing and catching exceptions in code which is not on
> critical path
> 3. Move memory allocation code to overriden `new` (which will probably
> turn it into a mess)
>
> BTW, if catching exceptions is discouraged at all, we need to rewrite
> all code which uses STL containers.

I would start with something much simpler. The proposed roadmap requires 
heavy refactoring upfront without actually being able to test the thing 
until much later on.

99.9% of memory allocated by 0mq is allocated at two places:

1. src/msg.cpp:56
2. src/yqueue.hpp:108

With large messages the most allocation happens in 1., with small 
messages, most memory is allocated by 2.

So, if you create a test program which would publish say 10MB messages 
in tight loop, while the peer is not receiving, you'll hit the 
allocation error in 1.

If you do the same with messages 1 byte long, you'll hit the error in 2.

Having the test program I would try to write, test and submit small 
gradual patches that improve reliability in these cases.

Martin



More information about the zeromq-dev mailing list