[zeromq-dev] Improving zeromq in OOM conditions

Paul Colomiets paul at colomiets.name
Tue May 17 00:06:26 CEST 2011


On Mon, May 16, 2011 at 10:38 AM, Martin Sustrik <sustrik at 250bpm.com> wrote:

> Hi Paul,
>
>
>  As Martin encouraged me to fix zeromq in out of memory conditions. Here
>> are first patches and first questions.
>>
>> There are a lot of explicit and implicit (e.g. inserting in STL
>> container) memory allocations in consturctors in zeromq code. As long as
>> we are encouraged not to use exceptions in zeromq code, we can't
>> gracefully propagate exceptions from there. So I see three options:
>>
>> 1. Refactor code to have all the memory allocations in `init()` method
>> (other name?)
>> 2. Allow throwing and catching exceptions in code which is not on
>> critical path
>> 3. Move memory allocation code to overriden `new` (which will probably
>> turn it into a mess)
>>
>> BTW, if catching exceptions is discouraged at all, we need to rewrite
>> all code which uses STL containers.
>>
>
> I would start with something much simpler. The proposed roadmap requires
> heavy refactoring upfront without actually being able to test the thing
> until much later on.
>
> 99.9% of memory allocated by 0mq is allocated at two places:
>
> 1. src/msg.cpp:56
>
Fixed in one of patches attached to previous email


> 2. src/yqueue.hpp:108
>
Will look into that.


>
> With large messages the most allocation happens in 1., with small messages,
> most memory is allocated by 2.


> So, if you create a test program which would publish say 10MB messages in
> tight loop, while the peer is not receiving, you'll hit the allocation error
> in 1.
>
> If you do the same with messages 1 byte long, you'll hit the error in 2.
>
> Having the test program I would try to write, test and submit small gradual
> patches that improve reliability in these cases.
>
I've tried, but got errors seems to be related to reconnection. Are there
any disconnects on OOM conditions? (will look myself, but may be you
remember something)

Anyway I've hit:

zmq_init.cpp:62
connect_session.cpp:62

there was also error in timers and few other places I don't remember now.

You can look at my test at:
https://github.com/tailhook/libzmq/commit/591abc22645f5d51270af46d9a60039ef6bec449


>
> Martin
>



-- 
--
Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20110517/4384994f/attachment.htm>


More information about the zeromq-dev mailing list