[zeromq-dev] Weird behavior with zmq_proxy performance

Bruno D. Rodrigues bruno.rodrigues at litux.org
Tue Nov 19 13:13:37 CET 2013


as requested I’ve created a ticket and updated the branch with the latest code and a perf/README.txt explaining how to run it (basically the instructions below)

https://github.com/zeromq/libzmq/issues/757


On Nov 10, 2013, at 13:08, Bruno D. Rodrigues <bruno.rodrigues at litux.org> wrote:

> I’ve branched the code to add the proxy code for testing:
> https://github.com/davipt/libzmq/tree/fix-002-proxy_lat_thr
> 
> This now allows me:
> 
> 1. current PUSH/PULL end-to-end test:
> 
> idavi:perf bruno$ ./local_thr tcp://127.0.0.1:5555 500 10000000 &
> local_thr bind-to=tcp://127.0.0.1:5555 message-size=500 message-count=10000000 type=0 check=0 connect=0
> 
> idavi:perf bruno$ ./remote_thr tcp://127.0.0.1:5555 500 10000000 &
> remote_thr connect-to=tcp://127.0.0.1:5555 message-size=500 message-count=10000000 type=0 check=0
> 
> message size: 500 [B]
> message count: 10000000
> mean throughput: 1380100 [msg/s]
> mean throughput: 5520.400 [Mb/s]
> 
> 2. PUB/SUB end-to-end test:
> 
> idavi:perf bruno$ ./local_thr tcp://127.0.0.1:5555 500 10000000 1 &
> local_thr bind-to=tcp://127.0.0.1:5555 message-size=500 message-count=10000000 type=1 check=0 connect=0
> 
> idavi:perf bruno$ ./remote_thr tcp://127.0.0.1:5555 500 10000000 1 &
> remote_thr connect-to=tcp://127.0.0.1:5555 message-size=500 message-count=10000000 type=1 check=0
> 
> message size: 500 [B]
> message count: 10000000
> mean throughput: 971666 [msg/s]
> mean throughput: 3886.664 [Mb/s]
> 
> 3. same test via zmq_proxy, by switching local_lat from bind to connect:
> 
> idavi:perf bruno$ ./proxy tcp://*:8881 tcp://*:8882 &
> Proxy type=PULL|PUSH in=tcp://*:8881 out=tcp://*:8882
> 
> idavi:perf bruno$ ./local_thr tcp://127.0.0.1:8882 500 10000000 32 &
> local_thr bind-to=tcp://127.0.0.1:8882 message-size=500 message-count=100000 type=32 check=0 connect=32
> 
> idavi:perf bruno$ ./remote_thr tcp://127.0.0.1:8881 500 10000000 &
> remote_thr connect-to=tcp://127.0.0.1:8881 message-size=500 message-count=10000000 type=0 check=0
> 
> message size: 500 [B]
> message count: 10000000
> mean throughput: 92974 [msg/s]
> mean throughput: 371.896 [Mb/s]
> 
> 4. same test via proxy and PUB/SUB, including checking if every message arrives (*)
> 
> idavi:perf bruno$ ./proxy tcp://*:8881 tcp://*:8882 1 &
> Proxy type=XSUB|XPUB in=tcp://*:8881 out=tcp://*:8882
> 
> idavi:perf bruno$ ./local_thr tcp://127.0.0.1:8882 500 10000000 49 &
> local_thr bind-to=tcp://127.0.0.1:8882 message-size=500 message-count=10000000 type=49 check=16 connect=32
> 
> idavi:perf bruno$ ./remote_thr tcp://127.0.0.1:8881 500 10000000 17 &
> remote_thr connect-to=tcp://127.0.0.1:8881 message-size=500 message-count=10000000 type=17 check=16
> 
> message size: 500 [B]
> message count: 10000000
> mean throughput: 88721 [msg/s]
> mean throughput: 354.884 [Mb/s]
> 
> (*) if check is enabled on the remote_thr, the message, if size>16, will contain a counter. On the local_thr it will then verify if the counter comes at the expected order and without loosing any message. Hence why the remote_thr needs to increase the HWM and sleep for one second in case of PUB/SUB.
> 
> 
> So, then again, what is happening with the zmq_proxy?
>  
> 
> 
> 
> On Nov 7, 2013, at 22:15, Bruno D. Rodrigues <bruno.rodrigues at litux.org> wrote:
> 
>> I’ve been testing a lot of combinations of ZeroMQ over Java, between the pure jeromq base and the jzmq JNI libzmq C code. Albeit my impression so far is that jeromq is way faster than the binding - not that the code isn’t great, but my feeling so far is that the JNI jump slows everything down - at a certain point I felt the need for a simple zmq_proxy network node and I was pretty sure that the C code must be faster than the jeromq. I have some ideas that can improve the jeromq proxy code, but it felt easier to just compile the zmq_proxy code from the book.
>> 
>> Unfortunately something went completely wrong on my side so I need your help to understand what is happening here.
>> 
>> Context:
>> MacOSX Mavericks fully updated, MBPro i7 4x2 CPU 2.2Ghz 16GB
>> libzmq from git head
>> (same for jeromq and libzmq, albeit I’m using my own fork so I can send pulls back)
>> my data are json lines that goes from about 100 bytes to some multi MB exceptions, but the average of those million messages is about 500bytes.
>> 
>> Test 1: pure local_thr and remote_thr:
>> 
>> iDavi:perf bruno$ ./local_thr tcp://127.0.0.1:8881 500 1000000 &
>> iDavi:perf bruno$ time ./remote_thr tcp://127.0.0.1:8881 500 1000000 &
>> real	0m0.732s
>> user	0m0.516s
>> sys	0m0.394s
>> message size: 500 [B]
>> message count: 1000000
>> mean throughput: 1418029 [msg/s]
>> mean throughput: 5672.116 [Mb/s]
>> 
>> Test 2: change local_thr to perform connect instead of bind, and put a proxy in the middle.
>> The proxy is the first C code example from the book, available here https://gist.github.com/davipt/7361477
>> iDavi:c bruno$ gcc -o proxy proxy.c -I /usr/local/include/ -L /usr/local/lib/ -lzmq
>> iDavi:c bruno$ ./proxy tcp://*:8881 tcp://*:8882 1
>> Proxy type=PULL/PUSH in=tcp://*:8881 out=tcp://*:8882
>> 
>> iDavi:perf bruno$ ./local_thr tcp://127.0.0.1:8882 500 1000000 &
>> iDavi:perf bruno$ time ./remote_thr tcp://127.0.0.1:8881 500 1000000 &
>> iDavi:perf bruno$ message size: 500 [B]
>> message count: 1000000
>> mean throughput: 74764 [msg/s]
>> mean throughput: 299.056 [Mb/s]
>> 
>> real	0m10.358s
>> user	0m0.668s
>> sys	0m0.508s
>> 
>> 
>> Test3: use the jeromq equivalent of the proxy: https://gist.github.com/davipt/7361623
>> 
>> iDavi:perf bruno$ ./local_thr tcp://127.0.0.1:8882 500 1000000 &
>> [1] 15816
>> iDavi:perf bruno$ time ./remote_thr tcp://127.0.0.1:8881 500 1000000 &
>> [2] 15830
>> iDavi:perf bruno$ 
>> real	0m3.429s
>> user	0m0.654s
>> sys	0m0.509s
>> message size: 500 [B]
>> message count: 1000000
>> mean throughput: 293532 [msg/s]
>> mean throughput: 1174.128 [Mb/s]
>> 
>> This performance coming out of Java is okish, it’s here just for comparison, and I’ll spend some time looking at it.
>> 
>> The core question is the C proxy - why 10 times slower than the no-proxy version?
>> 
>> One thing I noticed, by coincidence, is that on the upper side of the proxy, both with the C “producer” as well as the java one, tcpdump shows me consistently packets of 16332 (or the MTU size if using ethernet, 1438 I think). This value is consistent for the 4 combinations of producers and proxies (jeromq vs c).
>> 
>> But on the other side of the proxy, the result is completely different. With the jeromq proxy, I see packets of 8192 bytes, but with the C code I see packets of either 509 or 1010. It feels like the proxy is sending the messages one by one. Again, this value is consistent with the PULL consumer after the proxy, being it C or java.
>> 
>> So this is something on the proxy “backend” socket side of the zmq_proxy.
>> 
>> Also, I see quite similar behavior with a PUB - [XSUB+Proxy+XPUB] - SUB version.
>> 
>> What do I need to tweak on the proxy.c ?
>> 
>> Thanks in advance
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20131119/1471462f/attachment.htm>


More information about the zeromq-dev mailing list