[zeromq-dev] ZMQ performance

Apostolis Xekoukoulotakis xekoukou at gmail.com
Wed Jan 9 04:06:25 CET 2013

Just guessing here. Are you using the same context in all threads and if
so, maybe you need to increase the threads that the omq uses inside it.

2013/1/9 A. Mark <gougolith at gmail.com>

> OK, so I went back and I fixed a couple of issues and reattached the two
> modified test programs, added RCV/SND buffer shaping and now it uses
> zmq_msg_init_data (zero-copy) for better performance. I'm getting about
> 2.5GB/s avg at best which is a lot better then with remote_thr local_thr
> but still a 25% less then what I'm expecting at least 3.4GB/s.
> When I initiate 4 simultaneous procesess(not threads) for each client and
> server via separate ports the total does add up to ~3.3GB/s as it should.
> The trouble is for that to work that way I need to bind 4 ports and the
> whole point in using accept is to have multiple connections on the same
> port traditionally.
> Is there a way to achieve the desired throughput via 0MQ without using
> separate ports for each socket? I think using multiple connections (via
> separate threads) on the same ZMQ socket should naturally do it but
> according to the results it doesn't happen.
> On Mon, Jan 7, 2013 at 7:16 PM, A. Mark <gougolith at gmail.com> wrote:
>> Hello,
>> I'm very interested in porting my current transfer engine to 0MQ. The
>> current engine is written in pure BSD sockets and has certain limitations
>> that would be easily overcome by QMQ's intelligent and versatile design.
>> However my main concern is performance on very long messages in access of
>> 1MB. The current backbone MT design is the following:
>> control node (client ) <---> server A--- worker node 1 <---> worker node
>> 1 ------ server B
>> |
>> |
>>                                        |------------ worker node 2 <--->
>> worker node 2 -----------|
>> |                                                                          |
>>                                        --------------worker node N <--->
>> worker node N ----------
>> So the control client controls whatever task needs to be performed by
>> submitting requests to a server, the actual work is done by the worker
>> nodes in each separate thread on the server. The worker nodes are
>> synchronized across the two servers but they work independently since they
>> are working on the same task. Each worker node has it's own FD but connect
>> to the same TCP address and port. The main task of each node is to perform
>> some transformation on some large data buffer from a buffer pool then push
>> the finished result to the other server. My current benchmarks gives me
>> 3.5GBytes/s using TCP over the local loop when simply pushing the buffers
>> without doing any work.
>> I ran the 0MQ benchmarks local_thr and remote_thr, and the performance is
>> only 1.5GB/s at best, with large buffers(messages) and lower with small
>> ones. I'm also concerned looking at the benchmarks for the 10GE test. My
>> current engine can perform at a steady 1.1GBytes/s with large buffers over
>> 10GE.
>> I've also tried a modified version of the two benchmarks to try to
>> emulate the above situation, but the performance is about the same. The
>> modified MT code is attached.
>> Is there something else I need to do to get the best performance out of
>> 0MQ using MT for this work flow engine?
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev


Sincerely yours,

     Apostolis Xekoukoulotakis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130109/6537fd62/attachment.htm>

More information about the zeromq-dev mailing list