[zeromq-dev] Memory pool for zmq_msg_t

Francesco francesco.montorsi at gmail.com
Thu Aug 15 11:34:12 CEST 2019


Hi Doron, hi Jens,
Yes the allocator method is a nice solution.
I think it would be nice to have libzmq provide also a memory pool
implementation but use as default the malloc/free implementation for
backward compatibility.

It's also important to have a smart allocator that internally contains not
just  one but several pools for different packet size classes,to avoid
memory waste. But I think this can fit easily in the allocator pattern
sketched out by Jens.

Btw another issue unrelated to the allocator API but regarding performance
aspects: I think it's important to avoid not only the msg buffer but also
the allocation of the content_t structure and indeed in my preliminary
merge request I did modify zmq_msg_t of type_lmsg to use the first 40b
inside the pooled buffer.
Of course this approach is not backward compatible with the _init_data()
semantics.
How do you think this would best be approached?
I guess we may have a new _init_data_and_controlblock() helper that does
the trick of taking the first 40bytes of the provided buffer?

Thanks
Francesco


Il mer 14 ago 2019, 22:23 Doron Somech <somdoron at gmail.com> ha scritto:

> Jens I like the idea.
>
> We actually don't need the release method.
> The signature of the allocate should receive zmq_msg and allocate it.
>
> int (&allocate)(zmq_msg *msg, size_t size, void *obj);
>
> When the allocator will create the zmq_msg it will provide the release
> method to the zmq_msg in the constructor.
>
> This is important in order to forward messages between sockets, so the
> release method is part of the msg. This is already supported by zmq_msg
> which accept free method with a hint (obj in your example).
>
> The return value of allocate will be success indication, like the rest of
> zeromq methods.
>
> zeromq actually already support pool mechanism when sending, using zmq_msg
> api. Receiving is the problem, your suggestion solve it nicely.
>
> By the way, memory pool already supported in NetMQ in a very similar
> solution as you suggested. (It is global for all sockets without override)
>
>
>
> On Wed, Aug 14, 2019, 22:41 Jens Auer <jens.auer at betaversion.net> wrote:
>
>> Hi,
>>
>> Maybe this can be combined with a request that I have seen a couple of
>> times to be able to configure the allocator used in libzmq? I am thinking
>> of something like
>>
>> struct zmq_allocator {
>>     void* obj;
>>     void* (&allocate)(size_t n, void* obj);
>>     void (&release)(void* ptr, void* obj);
>> };
>>
>> void* useMalloc(size_t n, void*) {return malloc(n);}
>> void freeMalloc(void* ptr) {free(ptr);}
>>
>> zmq_allocator& zmg_default_allocator() {
>>     static zmg_allocator defaultAllocator = {nullptr, useMalloc,
>> freeMalloc};
>>     Return defaultAllocator;
>> }
>>
>> The context could then store the allocator for libzmq, and users could
>> set a specific allocator as a context option, e.g. with a zmq_ctx_set. A
>> socket created for a context can then inherit the default allocator or set
>> a special allocator as a socket option.
>>
>> class MemoryPool {…}; // hopefully thread-safe
>> void* poolAllocate(size_t n) {return
>>
>> MemoryPool pool;
>>
>> void* allocatePool(size_t n, void* pool) {return
>> static_cast<MemoryPool*>(pool)->allocate(n);}
>> void releasePool(void* ptr, void* pool)
>> {static_cast<MemoryPool*>(pool)->release(ptr);}
>>
>> zmq_allocator pooledAllocator {
>>     &pool, allocatePool, releasePool
>> }
>>
>> void* cdx = zmq_ctx_new();
>> zmq_ctx_set(ZMQ_ALLOCATOR, &pooledAllocator);
>>
>> Cheers,
>> Jens
>>
>> Am 13.08.2019 um 13:24 schrieb Francesco <francesco.montorsi at gmail.com>:
>>
>> Hi all,
>>
>> today I've taken some time to attempt building a memory-pooling
>> mechanism in ZMQ local_thr/remote_thr benchmarking utilities.
>> Here's the result:
>> https://github.com/zeromq/libzmq/pull/3631
>> This PR is a work in progress and is a simple modification to show the
>> effects of avoiding malloc/free when creating zmq_msg_t with the
>> standard benchmark utils of ZMQ.
>>
>> In particular the very fast, zero-lock,
>> single-producer/single-consumer queue from:
>> https://github.com/cameron314/readerwriterqueue
>> is used to maintain between the "remote_thr" main thread and its ZMQ
>> background IO thread a list of free buffers that can be used.
>>
>> Here are the graphical results:
>> with mallocs / no memory pool:
>>
>> https://cdn1.imggmi.com/uploads/2019/8/13/9f009b91df394fa945cd2519fd993f50-full.png
>> with memory pool:
>>
>> https://cdn1.imggmi.com/uploads/2019/8/13/f3ae0d6d58e9721b63129c23fe7347a6-full.png
>>
>> Doing the math the memory pooled approach shows:
>>
>> mostly the same performances for messages <= 32B
>> +15% pps/throughput increase @ 64B,
>> +60% pps/throughput increase @ 128B,
>> +70% pps/throughput increase @ 210B
>>
>> [the tests were stopped at 210B because my current quick-dirty memory
>> pool approach has fixed max msg size of about 210B].
>>
>> Honestly this is not a huge speedup, even if still interesting.
>> Indeed with these changes the performances now seem to be bounded by
>> the "local_thr" side and not by the "remote_thr" anymore. Indeed the
>> zmq background IO thread for local_thr is the only thread at 100% in
>> the 2 systems and its "perf top" now shows:
>>
>>  15,02%  libzmq.so.5.2.3     [.] zmq::metadata_t::add_ref
>>  14,91%  libzmq.so.5.2.3     [.] zmq::v2_decoder_t::size_ready
>>   8,94%  libzmq.so.5.2.3     [.] zmq::ypipe_t<zmq::msg_t, 256>::write
>>   6,97%  libzmq.so.5.2.3     [.] zmq::msg_t::close
>>   5,48%  libzmq.so.5.2.3     [.]
>> zmq::decoder_base_t<zmq::v2_decoder_t, zmq::shared_message_memory_allo
>>   5,40%  libzmq.so.5.2.3     [.] zmq::pipe_t::write
>>   4,94%  libzmq.so.5.2.3     [.]
>> zmq::shared_message_memory_allocator::inc_ref
>>   2,59%  libzmq.so.5.2.3     [.] zmq::msg_t::init_external_storage
>>   1,63%  [kernel]            [k] copy_user_enhanced_fast_string
>>   1,56%  libzmq.so.5.2.3     [.] zmq::msg_t::data
>>   1,43%  libzmq.so.5.2.3     [.] zmq::msg_t::init
>>   1,34%  libzmq.so.5.2.3     [.] zmq::pipe_t::check_write
>>   1,24%  libzmq.so.5.2.3     [.]
>> zmq::stream_engine_base_t::in_event_internal
>>   1,24%  libzmq.so.5.2.3     [.] zmq::msg_t::size
>>
>> Do you know what this stacktrace might mean?
>> I would expect to have that ZMQ background thread topping in its
>> read() system call (from TCP socket)...
>>
>> Thanks,
>> Francesco
>>
>>
>> Il giorno ven 19 lug 2019 alle ore 18:15 Francesco
>> <francesco.montorsi at gmail.com> ha scritto:
>>
>>
>> Hi Yan,
>> Unfortunately I have interrupted my attempts in this area after getting
>> some strange results (possibly due to the fact that I tried in a complex
>> application context... I should probably try hacking a simple zeromq
>> example instead!).
>>
>> I'm also a bit surprised that nobody has tried and posted online a way to
>> achieve something similar (Memory pool zmq send) ... But anyway It remains
>> in my plans to try that out when I have a bit more spare time...
>> If you manage to have some results earlier, I would be eager to know :-)
>>
>> Francesco
>>
>>
>> Il ven 19 lug 2019, 04:02 Yan, Liming (NSB - CN/Hangzhou) <
>> liming.yan at nokia-sbell.com> ha scritto:
>>
>>
>> Hi,  Francesco
>>   Could you please share the final solution and benchmark result for plan
>> 2?  Big Thanks.
>>   I'm concerning this because I had tried the similar before with
>> zmq_msg_init_data() and zmq_msg_send() but failed because of two issues.
>>  1)  My process is running in background for long time and finally I found
>> it occupies more and more memory, until it exhausted the system memory. It
>> seems there's memory leak with this way.   2) I provided *ffn for
>> deallocation but the memory freed back is much slower than consumer. So
>> finally my own customized pool could also be exhausted. How do you solve
>> this?
>>   I had to turn back to use zmq_send(). I know it has memory copy penalty
>> but it's the easiest and most stable way to send message. I'm still using
>> 0MQ 4.1.x.
>>   Thanks.
>>
>> BR
>> Yan Limin
>>
>> -----Original Message-----
>> From: zeromq-dev [mailto:zeromq-dev-bounces at lists.zeromq.org
>> <zeromq-dev-bounces at lists.zeromq.org>] On Behalf Of Luca Boccassi
>> Sent: Friday, July 05, 2019 4:58 PM
>> To: ZeroMQ development list <zeromq-dev at lists.zeromq.org>
>> Subject: Re: [zeromq-dev] Memory pool for zmq_msg_t
>>
>> There's no need to change the source for experimenting, you can just use
>> _init_data without a callback and with a callback (yes the first case will
>> leak memory but it's just a test), and measure the difference between the
>> two cases. You can then immediately see if it's worth pursuing further
>> optimisations or not.
>>
>> _external_storage is an implementation detail, and it's non-shared
>> because it's used in the receive case only, as it's used with a reference
>> to the TCP buffer used in the system call for zero-copy receives. Exposing
>> that means that those kind of messages could not be used with pub-sub or
>> radio-dish, as they can't have multiple references without copying them,
>> which means there would be a semantic difference between the different
>> message initialisation APIs, unlike now when the difference is only in who
>> owns the buffer. It would make the API quite messy in my opinion, and be
>> quite confusing as pub/sub is probably the most well known pattern.
>>
>> On Thu, 2019-07-04 at 23:20 +0200, Francesco wrote:
>>
>> Hi Luca,
>> thanks for the details. Indeed I understand why the "content_t" needs
>> to be allocated dynamically: it's just like the control block used by
>> STL's std::shared_ptr<>.
>>
>> And you're right: I'm not sure how much gain there is in removing 100%
>> of malloc operations from my TX path... still I would be curious to
>> find it out but right now it seems I need to patch ZMQ source code to
>> achieve that.
>>
>> Anyway I wonder if it could be possible to expose in the public API a
>> method like "zmq::msg_t::init_external_storage()" that, AFAICS, allows
>> to create a non-shared zero-copy long message... it appears to be used
>> only by v2 decoder internally right now...
>> Is there a specific reason why that's not accessible from the public
>> API?
>>
>> Thanks,
>> Francesco
>>
>>
>>
>>
>>
>> Il giorno gio 4 lug 2019 alle ore 20:25 Luca Boccassi <
>> luca.boccassi at gmail.com> ha scritto:
>>
>> Another reason for that small struct to be on the heap is so that it
>> can be shared among all the copies of the message (eg: a pub socket
>> has N copies of the message on the stack, one for each subscriber).
>> The struct has an atomic counter in it, so that when all the copies
>> of the message on the stack have been closed, the userspace buffer
>> deallocation callback can be invoked. If the atomic counter were on
>> the stack inlined in the message, this wouldn't work.
>> So even if room were to be found, a malloc would still be needed.
>>
>> If you _really_ are worried about it, and testing shows it makes a
>> difference, then one option could be to pre-allocate a set of these
>> metadata structures at startup, and just assign them when the
>> message is created. It's possible, but increases complexity quite a
>> bit, so it needs to be worth it.
>>
>> On Thu, 2019-07-04 at 17:42 +0100, Luca Boccassi wrote:
>>
>> The second malloc cannot be avoided, but it's tiny and fixed in
>>
>> size
>>
>> at
>> compile time, so the compiler and glibc will be able to optimize
>>
>> it
>>
>> to
>> death.
>>
>> The reason for that is that there's not enough room in the 64
>>
>> bytes
>>
>> to
>> store that structure, and increasing the message allocation on
>>
>> the
>>
>> stack past 64 bytes means it will no longer fit in a single cache
>> line, which will incur in a performance penalty far worse than the
>>
>> small
>>
>> malloc (I tested this some time ago). That is of course unless
>>
>> you
>>
>> are
>> running on s390 or a POWER with 256 bytes cacheline, but given
>>
>> it's
>>
>> part of the ABI it would be a bit of a mess for the benefit of
>>
>> very
>>
>> few
>> users if any.
>>
>> So I'd recommend to just go with the second plan, and compare
>>
>> what
>>
>> the
>> result is when passing a deallocation function vs not passing it
>>
>> (yes
>>
>> it will leak the memory but it's just for the test). My bet is
>>
>> that
>>
>> the
>> difference will not be that large.
>>
>> On Thu, 2019-07-04 at 16:30 +0200, Francesco wrote:
>>
>> Hi Stephan, Hi Luca,
>>
>> thanks for your hints. However I inspected
>>
>> https://github.com/dasys-lab/capnzero/blob/master/capnzero/src/Publi
>> sher.cpp
>>
>>
>> and I don't think it's saving from malloc()...  see my point
>>
>> 2)
>>
>> below:
>>
>> Indeed I realized that probably current ZMQ API does not allow
>>
>> me
>>
>> to
>> achieve the 100% of what I intended to do.
>> Let me rephrase my target: my target is to be able to
>> - memory pool creation: do a large memory allocation of, say,
>>
>> 1M
>>
>> zmq_msg_t only at the start of my program; let's say I create
>>
>> all
>>
>> these zmq_msg_t of a size of 2k bytes each (let's assume this
>>
>> is
>>
>> the
>> max size of message possible in my app)
>> - during application lifetime: call zmq_msg_send() at anytime
>> always avoiding malloc() operations (just picking the first
>> available unused entry of zmq_msg_t from the memory pool).
>>
>> Initially I thought that was possible but I think I have
>>
>> identified
>>
>> 2
>> blocking issues:
>> 1) If I try to recycle zmq_msg_t directly: in this case I will
>>
>> fail
>>
>> because I cannot really change only the "size" member of a
>> zmq_msg_t without reallocating it... so that I'm forced (in my
>> example)
>>
>> to
>>
>> always send 2k bytes out (!!)
>> 2) if I do create only a memory pool of buffers of 2k bytes and
>> then wrap the first available buffer inside a zmq_msg_t
>> (allocated
>>
>> on
>>
>> the
>> stack, not in the heap): in this case I need to know when the
>> internals of ZMQ have completed using the zmq_msg_t and thus
>>
>> when I
>>
>> can mark that buffer as available again in my memory pool.
>>
>> However
>>
>> I
>> see that zmq_msg_init_data() ZMQ code contains:
>>
>>    //  Initialize constant message if there's no need to
>> deallocate
>>    if (ffn_ == NULL) {
>> ...
>>        _u.cmsg.data = data_;
>>        _u.cmsg.size = size_;
>> ...
>>    } else {
>> ...
>>        _u.lmsg.content =
>>          static_cast<content_t *> (malloc (sizeof
>>
>> (content_t)));
>>
>> ...
>>        _u.lmsg.content->data = data_;
>>        _u.lmsg.content->size = size_;
>>        _u.lmsg.content->ffn = ffn_;
>>        _u.lmsg.content->hint = hint_;
>>        new (&_u.lmsg.content->refcnt) zmq::atomic_counter_t
>>
>> ();
>>
>>    }
>>
>> So that I skip malloc() operation only if I pass ffn_ == NULL.
>>
>> The
>>
>> problem is that if I pass ffn_ == NULL, then I have no way to
>>
>> know
>>
>> when the internals of ZMQ have completed using the zmq_msg_t...
>>
>> Any way to workaround either issue 1) or issue 2) ?
>>
>> I understand that the malloc is just of size(content_t)~=
>>
>> 40B...
>>
>> but
>> still I'd like to avoid it...
>>
>> Thanks!
>> Francesco
>>
>>
>>
>>
>>
>> Il giorno gio 4 lug 2019 alle ore 14:58 Stephan Opfer <
>> opfer at vs.uni-kassel.de
>>
>> ha scritto:
>> On 04.07.19 14:29, Luca Boccassi wrote:
>>
>> How users make use of these primitives is up to them
>>
>> though, I
>>
>>
>> don't
>>
>> think anything special was shared before, as far as I
>>
>> remember.
>>
>>
>> Some example can be found here:
>>
>> https://github.com/dasys-lab/capnzero/tree/master/capnzero/src
>>
>>
>>
>> The classes Publisher and Subscriber should replace the
>>
>> publisher
>>
>> and
>> subscriber in a former Robot-Operating-System-based System. I
>> hope that the subscriber is actually using the method Luca is
>> talking
>>
>> about
>>
>> on the
>> receiving side.
>>
>> The message data here is a Cap'n Proto container that we
>> "simply"
>> serialize and send via ZeroMQ -> therefore the name Cap'nZero
>>
>> ;-)
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>>
>>
>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>>
>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>
>> --
>> Kind regards,
>> Luca Boccassi
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20190815/a40147a8/attachment.htm>


More information about the zeromq-dev mailing list