[zeromq-dev] (almost) zero-copy message receive

Auer, Jens jens.auer at cgi.com
Tue Jun 2 15:05:18 CEST 2015

Hi Pieter,

the reason I wanted to ask first is because I had to switch on C++11 to make it work without changing atomic_counter_t. The reason is that I eliminated msg_t::content_t completely to save a mallic call  by adding the members in content_t to the msg_t class directly since there is now space enough. However, atomic_counter_t is not a POD and cannot be put into the union. For my proof-of-concept, switching on C++11 is fine, but I am not sure if that is ok for the main branch. Personally, I think that 4 years of C++11, this should not be an issues, but there may be platforms with old compilers which you want to support.

The only alternative I came up with would be to make atomic_counter_t a classical C struct with free functions instead of a class. I don't like this very much.

Best wishes,

Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.auer at cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI Group Inc. and its affiliates may be contained in this message. If you are not a recipient indicated or intended in this message (or responsible for delivery of this message to such person), or you think for any reason that this message may have been addressed to you in error, you may not use or copy or deliver this message to anyone else. In such case, you should destroy this message and are asked to notify the sender by reply e-mail.

Von: zeromq-dev-bounces at lists.zeromq.org [zeromq-dev-bounces at lists.zeromq.org]" im Auftrag von "Pieter Hintjens [ph at imatix.com]
Gesendet: Dienstag, 2. Juni 2015 10:18
An: ZeroMQ development list
Betreff: Re: [zeromq-dev] (almost) zero-copy message receive


Sounds great. Feel free to send such patches to libzmq master; please
make sure they are as atomic as possible, each with a clear problem
statement, each testable individually.


On Tue, Jun 2, 2015 at 10:13 AM, Arnaud Loonstra <arnaud at sphaero.org> wrote:
> Although I'm not very familiar with zmq's internals this looks
> promising.
> Did you test if your implementation remains correct? ie. it doesn't
> introduce deadlocks or other race conditions?
> Rg,
> Arnaud
> On 2015-05-31 19:29, Jens Auer wrote:
>> Hi,
>> I did some performance analysis of  a program which receives data on
>> a (SUB or
>> PULL) socket, filters it for some criteria, extracts a value from the
>> message
>> and uses this as a subscription to forward the datato a PUB socket.
>> As
>> expected, most time is spent in memory allocations and memcpy
>> operations, so I
>> decided to check if there is an opportunity to  minimize these
>> operations in
>> the critical path. From my analysis, the path is as follows:
>> 1. stream_engine receives data from a socket into a static buffer of
>> 8192
>> bytes
>> 2. decoder/v2_decoder implement a state machine which reads the flag
>> and
>> message size, create a new message and copy the data into the message
>> data
>> field
>> 3. When sending, stream_engine copies the flags field, message and
>> message
>> data into a static buffer and sends this buffer completely to the
>> socket
>> Memory allocations are done in v2_decoder when a new message is
>> created, and
>> deallocations are done when sending the message. Memcpy operations
>> are done in
>> decoder to copy
>> - the flags byte into a temporary buffer
>> - the message size into a temporary buffer
>> - the message data into the dynamically allocated storage
>> Since the allocations and memcpy are the dominating operations, I
>> implemented
>> a scheme where these operations are minimized. The main idea is to
>> allocate
>> the receive buffer of 8192 byte dynamically and use this as the data
>> storage
>> for zero-copy messages created with msg_t::init_data. This replaces n
>> = 8192 /
>> (m_size + 10) memory allocations with one allocation, and it gets rid
>> of the
>> same number of memcpy operations for the message data. I implemented
>> this in a
>> fork (https://github.com/jens-auer/libzmq/tree/zero_copy_receive).
>> For
>> testing, I ran the throughput test (message size 100, 100000
>> messages) locally
>> and profiled for memory allocations and memcpy. The results are
>> promising:
>> - memory allocations reduced from 100,260 to 2,573
>> - memcpy operations reduced from 301,227 to 202,449. This is expected
>> because
>> for every message, three memcpys are done, and the patch removes the
>> data
>> memcpy only.
>> - throughput increased significantly by about 30-40% ( I only did a
>> couple of
>> runs to test it, no thorough benchmarking)
>> For the implementation, I had to change two other things. After my
>> first
>> implementation, I realized that msg_t::init_data does a malloc to
>> create the
>> content_t member. Given that msg_t's size is now 64 bytes, I removed
>> content_t
>> completely by adding the members of content_t to the lmsg_t union.
>> However,
>> this is problem with the current code because one of the members is a
>> atomic_counter_t which is a non-POD type and cannot be a union
>> member. For my
>> proof-of-concept implementation, I switched on C++11 mode because
>> this relaxes
>> the requirements for PODs.
>> I hope this could be useful and maybe included in the main branch. My
>> next
>> step is to change the encoder/stream engine to use writev to skip the
>> memcpy
>> operations when sending messages.
>> Best wishes,
>>   Jens Auer
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org

More information about the zeromq-dev mailing list