[zeromq-dev] (almost) zero-copy message receive
Thomas Rodgers
rodgert at twrodgers.com
Tue Jun 2 15:37:28 CEST 2015
>
> Personally, I think that 4 years of C++11, this should not be an issues,
> but there may be platforms with old compilers which you want to support.
4 years of C++11 *should* be enough, but wide-spread use of fully
conforming compilers is still an issue, for instance -
https://msdn.microsoft.com/en-us/library/hh567368.aspx
On Tue, Jun 2, 2015 at 8:05 AM, Auer, Jens <jens.auer at cgi.com> wrote:
> Hi Pieter,
>
> the reason I wanted to ask first is because I had to switch on C++11 to
> make it work without changing atomic_counter_t. The reason is that I
> eliminated msg_t::content_t completely to save a mallic call by adding the
> members in content_t to the msg_t class directly since there is now space
> enough. However, atomic_counter_t is not a POD and cannot be put into the
> union. For my proof-of-concept, switching on C++11 is fine, but I am not
> sure if that is ok for the main branch. Personally, I think that 4 years of
> C++11, this should not be an issues, but there may be platforms with old
> compilers which you want to support.
>
> The only alternative I came up with would be to make atomic_counter_t a
> classical C struct with free functions instead of a class. I don't like
> this very much.
>
> Best wishes,
> Jens
>
> --
> Jens Auer | CGI | Software-Engineer
> CGI (Germany) GmbH & Co. KG
> Rheinstraße 95 | 64295 Darmstadt | Germany
> T: +49 6151 36860 154
> jens.auer at cgi.com
> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie
> unter de.cgi.com/pflichtangaben.
>
> CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to
> CGI Group Inc. and its affiliates may be contained in this message. If you
> are not a recipient indicated or intended in this message (or responsible
> for delivery of this message to such person), or you think for any reason
> that this message may have been addressed to you in error, you may not use
> or copy or deliver this message to anyone else. In such case, you should
> destroy this message and are asked to notify the sender by reply e-mail.
>
> ________________________________________
> Von: zeromq-dev-bounces at lists.zeromq.org [
> zeromq-dev-bounces at lists.zeromq.org]" im Auftrag von "Pieter
> Hintjens [ph at imatix.com]
> Gesendet: Dienstag, 2. Juni 2015 10:18
> An: ZeroMQ development list
> Betreff: Re: [zeromq-dev] (almost) zero-copy message receive
>
> Jens,
>
> Sounds great. Feel free to send such patches to libzmq master; please
> make sure they are as atomic as possible, each with a clear problem
> statement, each testable individually.
>
> -Pieter
>
> On Tue, Jun 2, 2015 at 10:13 AM, Arnaud Loonstra <arnaud at sphaero.org>
> wrote:
> > Although I'm not very familiar with zmq's internals this looks
> > promising.
> > Did you test if your implementation remains correct? ie. it doesn't
> > introduce deadlocks or other race conditions?
> >
> > Rg,
> >
> > Arnaud
> >
> > On 2015-05-31 19:29, Jens Auer wrote:
> >> Hi,
> >>
> >> I did some performance analysis of a program which receives data on
> >> a (SUB or
> >> PULL) socket, filters it for some criteria, extracts a value from the
> >> message
> >> and uses this as a subscription to forward the datato a PUB socket.
> >> As
> >> expected, most time is spent in memory allocations and memcpy
> >> operations, so I
> >> decided to check if there is an opportunity to minimize these
> >> operations in
> >> the critical path. From my analysis, the path is as follows:
> >> 1. stream_engine receives data from a socket into a static buffer of
> >> 8192
> >> bytes
> >> 2. decoder/v2_decoder implement a state machine which reads the flag
> >> and
> >> message size, create a new message and copy the data into the message
> >> data
> >> field
> >> 3. When sending, stream_engine copies the flags field, message and
> >> message
> >> data into a static buffer and sends this buffer completely to the
> >> socket
> >>
> >> Memory allocations are done in v2_decoder when a new message is
> >> created, and
> >> deallocations are done when sending the message. Memcpy operations
> >> are done in
> >> decoder to copy
> >> - the flags byte into a temporary buffer
> >> - the message size into a temporary buffer
> >> - the message data into the dynamically allocated storage
> >>
> >> Since the allocations and memcpy are the dominating operations, I
> >> implemented
> >> a scheme where these operations are minimized. The main idea is to
> >> allocate
> >> the receive buffer of 8192 byte dynamically and use this as the data
> >> storage
> >> for zero-copy messages created with msg_t::init_data. This replaces n
> >> = 8192 /
> >> (m_size + 10) memory allocations with one allocation, and it gets rid
> >> of the
> >> same number of memcpy operations for the message data. I implemented
> >> this in a
> >> fork (https://github.com/jens-auer/libzmq/tree/zero_copy_receive).
> >> For
> >> testing, I ran the throughput test (message size 100, 100000
> >> messages) locally
> >> and profiled for memory allocations and memcpy. The results are
> >> promising:
> >> - memory allocations reduced from 100,260 to 2,573
> >> - memcpy operations reduced from 301,227 to 202,449. This is expected
> >> because
> >> for every message, three memcpys are done, and the patch removes the
> >> data
> >> memcpy only.
> >> - throughput increased significantly by about 30-40% ( I only did a
> >> couple of
> >> runs to test it, no thorough benchmarking)
> >>
> >> For the implementation, I had to change two other things. After my
> >> first
> >> implementation, I realized that msg_t::init_data does a malloc to
> >> create the
> >> content_t member. Given that msg_t's size is now 64 bytes, I removed
> >> content_t
> >> completely by adding the members of content_t to the lmsg_t union.
> >> However,
> >> this is problem with the current code because one of the members is a
> >> atomic_counter_t which is a non-POD type and cannot be a union
> >> member. For my
> >> proof-of-concept implementation, I switched on C++11 mode because
> >> this relaxes
> >> the requirements for PODs.
> >>
> >> I hope this could be useful and maybe included in the main branch. My
> >> next
> >> step is to change the encoder/stream engine to use writev to skip the
> >> memcpy
> >> operations when sending messages.
> >>
> >> Best wishes,
> >> Jens Auer
> >>
> >>
> >> _______________________________________________
> >> zeromq-dev mailing list
> >> zeromq-dev at lists.zeromq.org
> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >
> > _______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev at lists.zeromq.org
> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20150602/af196a58/attachment.htm>
More information about the zeromq-dev
mailing list