[zeromq-dev] Memory pool for zmq_msg_t

Luca Boccassi luca.boccassi at gmail.com
Fri Jul 5 10:57:30 CEST 2019


There's no need to change the source for experimenting, you can just
use _init_data without a callback and with a callback (yes the first
case will leak memory but it's just a test), and measure the difference
between the two cases. You can then immediately see if it's worth
pursuing further optimisations or not.

_external_storage is an implementation detail, and it's non-shared
because it's used in the receive case only, as it's used with a
reference to the TCP buffer used in the system call for zero-copy
receives. Exposing that means that those kind of messages could not be
used with pub-sub or radio-dish, as they can't have multiple references
without copying them, which means there would be a semantic difference
between the different message initialisation APIs, unlike now when the
difference is only in who owns the buffer. It would make the API quite
messy in my opinion, and be quite confusing as pub/sub is probably the
most well known pattern.

On Thu, 2019-07-04 at 23:20 +0200, Francesco wrote:
> Hi Luca,
> thanks for the details. Indeed I understand why the "content_t" needs
> to be allocated dynamically: it's just like the control block used by
> STL's std::shared_ptr<>.
> 
> And you're right: I'm not sure how much gain there is in removing
> 100% of malloc operations from my TX path... still I would be curious
> to find it out but right now it seems I need to patch ZMQ source code
> to achieve that.
> 
> Anyway I wonder if it could be possible to expose in the public API a
> method like "zmq::msg_t::init_external_storage()" that, AFAICS,
> allows to create a non-shared zero-copy long message... it appears to
> be used only by v2 decoder internally right now...
> Is there a specific reason why that's not accessible from the public
> API?
> 
> Thanks,
> Francesco
> 
> 
> 
> 
> 
> Il giorno gio 4 lug 2019 alle ore 20:25 Luca Boccassi <
> luca.boccassi at gmail.com> ha scritto:
> > Another reason for that small struct to be on the heap is so that
> > it
> > can be shared among all the copies of the message (eg: a pub socket
> > has
> > N copies of the message on the stack, one for each subscriber). The
> > struct has an atomic counter in it, so that when all the copies of
> > the
> > message on the stack have been closed, the userspace buffer
> > deallocation callback can be invoked. If the atomic counter were on
> > the
> > stack inlined in the message, this wouldn't work.
> > So even if room were to be found, a malloc would still be needed.
> > 
> > If you _really_ are worried about it, and testing shows it makes a
> > difference, then one option could be to pre-allocate a set of these
> > metadata structures at startup, and just assign them when the
> > message
> > is created. It's possible, but increases complexity quite a bit, so
> > it
> > needs to be worth it.
> > 
> > On Thu, 2019-07-04 at 17:42 +0100, Luca Boccassi wrote:
> > > The second malloc cannot be avoided, but it's tiny and fixed in
> > size
> > > at
> > > compile time, so the compiler and glibc will be able to optimize
> > it
> > > to
> > > death.
> > > 
> > > The reason for that is that there's not enough room in the 64
> > bytes
> > > to
> > > store that structure, and increasing the message allocation on
> > the
> > > stack past 64 bytes means it will no longer fit in a single cache
> > > line,
> > > which will incur in a performance penalty far worse than the
> > small
> > > malloc (I tested this some time ago). That is of course unless
> > you
> > > are
> > > running on s390 or a POWER with 256 bytes cacheline, but given
> > it's
> > > part of the ABI it would be a bit of a mess for the benefit of
> > very
> > > few
> > > users if any.
> > > 
> > > So I'd recommend to just go with the second plan, and compare
> > what
> > > the
> > > result is when passing a deallocation function vs not passing it
> > (yes
> > > it will leak the memory but it's just for the test). My bet is
> > that
> > > the
> > > difference will not be that large.
> > > 
> > > On Thu, 2019-07-04 at 16:30 +0200, Francesco wrote:
> > > > Hi Stephan, Hi Luca,
> > > > 
> > > > thanks for your hints. However I inspected 
> > > > 
> > https://github.com/dasys-lab/capnzero/blob/master/capnzero/src/Publisher.cpp
> > > > 
> > > >  and I don't think it's saving from malloc()...  see my point
> > 2)
> > > > below:
> > > > 
> > > > Indeed I realized that probably current ZMQ API does not allow
> > me
> > > > to
> > > > achieve the 100% of what I intended to do.
> > > > Let me rephrase my target: my target is to be able to 
> > > >  - memory pool creation: do a large memory allocation of, say,
> > 1M
> > > > zmq_msg_t only at the start of my program; let's say I create
> > all
> > > > these zmq_msg_t of a size of 2k bytes each (let's assume this
> > is
> > > > the
> > > > max size of message possible in my app) 
> > > >  - during application lifetime: call zmq_msg_send() at anytime
> > > > always
> > > > avoiding malloc() operations (just picking the first available
> > > > unused
> > > > entry of zmq_msg_t from the memory pool).
> > > > 
> > > > Initially I thought that was possible but I think I have
> > identified
> > > > 2
> > > > blocking issues:
> > > > 1) If I try to recycle zmq_msg_t directly: in this case I will
> > fail
> > > > because I cannot really change only the "size" member of a
> > > > zmq_msg_t
> > > > without reallocating it... so that I'm forced (in my example)
> > to
> > > > always send 2k bytes out (!!)
> > > > 2) if I do create only a memory pool of buffers of 2k bytes and
> > > > then
> > > > wrap the first available buffer inside a zmq_msg_t (allocated
> > on
> > > > the
> > > > stack, not in the heap): in this case I need to know when the
> > > > internals of ZMQ have completed using the zmq_msg_t and thus
> > when I
> > > > can mark that buffer as available again in my memory pool.
> > However
> > > > I
> > > > see that zmq_msg_init_data() ZMQ code contains:
> > > > 
> > > >     //  Initialize constant message if there's no need to
> > > > deallocate
> > > >     if (ffn_ == NULL) {
> > > > ...
> > > >         _u.cmsg.data = data_;
> > > >         _u.cmsg.size = size_;
> > > > ...
> > > >     } else {
> > > > ...
> > > >         _u.lmsg.content =
> > > >           static_cast<content_t *> (malloc (sizeof
> > (content_t)));
> > > > ...
> > > >         _u.lmsg.content->data = data_;
> > > >         _u.lmsg.content->size = size_;
> > > >         _u.lmsg.content->ffn = ffn_;
> > > >         _u.lmsg.content->hint = hint_;
> > > >         new (&_u.lmsg.content->refcnt) zmq::atomic_counter_t
> > ();
> > > >     }
> > > > 
> > > > So that I skip malloc() operation only if I pass ffn_ == NULL.
> > The
> > > > problem is that if I pass ffn_ == NULL, then I have no way to
> > know
> > > > when the internals of ZMQ have completed using the zmq_msg_t...
> > > > 
> > > > Any way to workaround either issue 1) or issue 2) ?
> > > > 
> > > > I understand that the malloc is just of size(content_t)~=
> > 40B...
> > > > but
> > > > still I'd like to avoid it...
> > > > 
> > > > Thanks!
> > > > Francesco
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Il giorno gio 4 lug 2019 alle ore 14:58 Stephan Opfer <
> > > > opfer at vs.uni-kassel.de
> > > > > ha scritto:
> > > > > On 04.07.19 14:29, Luca Boccassi wrote:
> > > > > > How users make use of these primitives is up to them
> > though, I
> > > > > 
> > > > > don't
> > > > > > think anything special was shared before, as far as I
> > remember.
> > > > > 
> > > > > Some example can be found here: 
> > > > > 
> > https://github.com/dasys-lab/capnzero/tree/master/capnzero/src
> > > > > 
> > > > > 
> > > > > The classes Publisher and Subscriber should replace the
> > publisher
> > > > > and 
> > > > > subscriber in a former Robot-Operating-System-based System. I
> > > > > hope
> > > > > that 
> > > > > the subscriber is actually using the method Luca is talking
> > about
> > > > > on the 
> > > > > receiving side.
> > > > > 
> > > > > The message data here is a Cap'n Proto container that we
> > > > > "simply" 
> > > > > serialize and send via ZeroMQ -> therefore the name Cap'nZero
> > ;-)
> > > > > 
> > > > > _______________________________________________
> > > > > zeromq-dev mailing list
> > > > > zeromq-dev at lists.zeromq.org
> > > > > 
> > > > > 
> > > > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > > > > 
> > > > > 
> > > > 
> > > > 
> > _______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev at lists.zeromq.org
> > 
> > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > 
> 
> 
-- 
Kind regards,
Luca Boccassi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20190705/74bf145f/attachment.sig>


More information about the zeromq-dev mailing list