[zeromq-dev] ZMQ performance

A. Mark gougolith at gmail.com
Wed Jan 9 01:55:17 CET 2013

OK, so I went back and I fixed a couple of issues and reattached the two
modified test programs, added RCV/SND buffer shaping and now it uses
zmq_msg_init_data (zero-copy) for better performance. I'm getting about
2.5GB/s avg at best which is a lot better then with remote_thr local_thr
but still a 25% less then what I'm expecting at least 3.4GB/s.

When I initiate 4 simultaneous procesess(not threads) for each client and
server via separate ports the total does add up to ~3.3GB/s as it should.
The trouble is for that to work that way I need to bind 4 ports and the
whole point in using accept is to have multiple connections on the same
port traditionally.

Is there a way to achieve the desired throughput via 0MQ without using
separate ports for each socket? I think using multiple connections (via
separate threads) on the same ZMQ socket should naturally do it but
according to the results it doesn't happen.

On Mon, Jan 7, 2013 at 7:16 PM, A. Mark <gougolith at gmail.com> wrote:

> Hello,
> I'm very interested in porting my current transfer engine to 0MQ. The
> current engine is written in pure BSD sockets and has certain limitations
> that would be easily overcome by QMQ's intelligent and versatile design.
> However my main concern is performance on very long messages in access of
> 1MB. The current backbone MT design is the following:
> control node (client ) <---> server A--- worker node 1 <---> worker node 1
> ------ server B
> |
> |
>                                        |------------ worker node 2 <--->
> worker node 2 -----------|
> |                                                                          |
>                                        --------------worker node N <--->
> worker node N ----------
> So the control client controls whatever task needs to be performed by
> submitting requests to a server, the actual work is done by the worker
> nodes in each separate thread on the server. The worker nodes are
> synchronized across the two servers but they work independently since they
> are working on the same task. Each worker node has it's own FD but connect
> to the same TCP address and port. The main task of each node is to perform
> some transformation on some large data buffer from a buffer pool then push
> the finished result to the other server. My current benchmarks gives me
> 3.5GBytes/s using TCP over the local loop when simply pushing the buffers
> without doing any work.
> I ran the 0MQ benchmarks local_thr and remote_thr, and the performance is
> only 1.5GB/s at best, with large buffers(messages) and lower with small
> ones. I'm also concerned looking at the benchmarks for the 10GE test. My
> current engine can perform at a steady 1.1GBytes/s with large buffers over
> 10GE.
> I've also tried a modified version of the two benchmarks to try to emulate
> the above situation, but the performance is about the same. The modified MT
> code is attached.
> Is there something else I need to do to get the best performance out of
> 0MQ using MT for this work flow engine?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130108/9b9b8eb5/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: client_zmq.cpp
Type: text/x-c++src
Size: 4285 bytes
Desc: not available
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130108/9b9b8eb5/attachment.cpp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: server_zmq.cpp
Type: text/x-c++src
Size: 4955 bytes
Desc: not available
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130108/9b9b8eb5/attachment-0001.cpp>

More information about the zeromq-dev mailing list