[zeromq-dev] ZMQ performance
Daisuke Maki
lestrrat at gmail.com
Wed Jan 9 06:32:25 CET 2013
nitpick, but isn't zmq_init() the one that's deprecated, and zmq_ctx_new()
its replacement?
2013/1/9 A. Mark <gougolith at gmail.com>
> Good guess, but I'm using this one:
>
> ctx = zmq_init( threads);
>
> from http://api.zeromq.org/3-2:zmq-init
>
>
> with the number of threads parameter passed as a command line argument to
> the test programs. I assume it should have the same effect as zmq_ctx_set()
> since zmq_ctx_new is deprecated. So I've tried different number of threads
> on each end but it doesn't seem to get better performance with more
> threads. BTW to be precise I have 2 command line arguments to client_zmq:
>
>
> usage: client_zmq <connect-to> <message-size> <message-count>
> <zmq-threads> <workers>
>
>
> So I can set the internal zmq threads as well as how many workers threads
> to spawn in the client.
>
> In server_zmq I can only set the zmq-threads of course.
>
> usage: server_zmq <connect-to> <message-size> <message-count> <zmq-threads>
>
>
> And yes I'm using the same context in the programs.
>
>
>
> On Tue, Jan 8, 2013 at 7:06 PM, Apostolis Xekoukoulotakis <
> xekoukou at gmail.com> wrote:
>
>> Just guessing here. Are you using the same context in all threads and if
>> so, maybe you need to increase the threads that the omq uses inside it.
>> http://api.zeromq.org/3-2:zmq-ctx-set
>>
>>
>>
>> 2013/1/9 A. Mark <gougolith at gmail.com>
>>
>>> OK, so I went back and I fixed a couple of issues and reattached the
>>> two modified test programs, added RCV/SND buffer shaping and now it uses
>>> zmq_msg_init_data (zero-copy) for better performance. I'm getting about
>>> 2.5GB/s avg at best which is a lot better then with remote_thr local_thr
>>> but still a 25% less then what I'm expecting at least 3.4GB/s.
>>>
>>> When I initiate 4 simultaneous procesess(not threads) for each client
>>> and server via separate ports the total does add up to ~3.3GB/s as it
>>> should. The trouble is for that to work that way I need to bind 4 ports and
>>> the whole point in using accept is to have multiple connections on the same
>>> port traditionally.
>>>
>>> Is there a way to achieve the desired throughput via 0MQ without using
>>> separate ports for each socket? I think using multiple connections (via
>>> separate threads) on the same ZMQ socket should naturally do it but
>>> according to the results it doesn't happen.
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jan 7, 2013 at 7:16 PM, A. Mark <gougolith at gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm very interested in porting my current transfer engine to 0MQ. The
>>>> current engine is written in pure BSD sockets and has certain limitations
>>>> that would be easily overcome by QMQ's intelligent and versatile design.
>>>> However my main concern is performance on very long messages in access of
>>>> 1MB. The current backbone MT design is the following:
>>>>
>>>>
>>>> control node (client ) <---> server A--- worker node 1 <---> worker
>>>> node 1 ------ server B
>>>>
>>>> |
>>>> |
>>>> |------------ worker node 2
>>>> <---> worker node 2 -----------|
>>>>
>>>> | |
>>>> --------------worker node N
>>>> <---> worker node N ----------
>>>>
>>>> So the control client controls whatever task needs to be performed by
>>>> submitting requests to a server, the actual work is done by the worker
>>>> nodes in each separate thread on the server. The worker nodes are
>>>> synchronized across the two servers but they work independently since they
>>>> are working on the same task. Each worker node has it's own FD but connect
>>>> to the same TCP address and port. The main task of each node is to perform
>>>> some transformation on some large data buffer from a buffer pool then push
>>>> the finished result to the other server. My current benchmarks gives me
>>>> 3.5GBytes/s using TCP over the local loop when simply pushing the buffers
>>>> without doing any work.
>>>>
>>>> I ran the 0MQ benchmarks local_thr and remote_thr, and the performance
>>>> is only 1.5GB/s at best, with large buffers(messages) and lower with small
>>>> ones. I'm also concerned looking at the benchmarks for the 10GE test. My
>>>> current engine can perform at a steady 1.1GBytes/s with large buffers over
>>>> 10GE.
>>>>
>>>> I've also tried a modified version of the two benchmarks to try to
>>>> emulate the above situation, but the performance is about the same. The
>>>> modified MT code is attached.
>>>>
>>>> Is there something else I need to do to get the best performance out of
>>>> 0MQ using MT for this work flow engine?
>>>>
>>>>
>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>>
>>
>>
>> --
>>
>>
>> Sincerely yours,
>>
>> Apostolis Xekoukoulotakis
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130109/3e81c029/attachment.htm>
More information about the zeromq-dev
mailing list