[zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)

Julie Anderson julie.anderson.uk at gmail.com
Wed Aug 29 21:31:17 CEST 2012


Here are the UDP numbers for whom it may concern. As one would expect much
better than TCP.

RTT: (round-trip time)

Iterations: 1,000,000 | Avg Time: *10373.9 nanos* | Min Time: 8626 nanos |
Max Time: 136269 nanos | 75%: 10186 nanos | 90%: 10253 nanos | 99%: 10327
nanos | 99.999%: 10372 nanos

OWT: (one-way time)

Iterations: 2,221,118 | Avg Time: *5095.66 nanos* | Min Time: 4220 nanos |
Max Time: 135584 nanos | 75%: 5001 nanos | 90%: 5037 nanos | 99%: 5071
nanos | 99.999%: 5094 nanos

-Julie


On Wed, Aug 29, 2012 at 12:18 PM, Julie Anderson <
julie.anderson.uk at gmail.com> wrote:

> New numbers (fun!). Firstly, to make sure I was comparing apples with
> apples, I modified my tests to compute one-way trip instead of round-trip.
> I can't paste code, but I am simply using a Java NIO (non-blocking I/O)
> optimized with busy spinning to send and receive tcp data. This is
> *standard* Java NIO code, nothing too fancy. You can google around for Java
> NIO. I found this link<http://www.cordinc.com/blog/2010/08/java-nio-server-example.html>that shows the basics. You can also do the same thing in C as you can see
> here<http://stackoverflow.com/questions/27247/could-you-recommend-some-guides-about-epoll-on-linux/6150841#6150841>
> .
>
> My test now consists of:
>
> - JVM A sends a message which consist of the ascii representation of a
> timestamp in nanos.
> - JVM B receives this message, parses the long, computer the one-way
> latency and echoes back the message to JVM A.
> - JVM A receives the echo, parses the ascii long and makes sure that it
> matches the one it sent out.
> - Loop back and send the next message.
>
> So now I have both times: one-way and round-trip.
>
> I ran my test for 1 million messages over loopback.
>
> For ZeroMQ I am using the local_lat and remote_lat programs included with
> latest zeromq from here: git://github.com/zeromq/libzmq.git
>
> The results:
>
> *- ZeroMQ:*
>
> ./local_lat tcp://lo:5555 13 1000000
> ./remote_lat tcp://127.0.0.1:5555 13 1000000
>
> message size: 13 [B]
> roundtrip count: 1000000
> average latency: *19.674* [us] *<==== this is one-way*
>
> *- Java NIO:* (EPoll with busy spinning)
>
> Round-trip: Iterations: 1,000,000 | Avg Time: *16552.15 nanos* | Min
> Time: 12515 nanos | Max Time: 129816 nanos | 75%: 16290 nanos | 90%: 16369
> nanos | 99%: 16489 nanos | 99.999%: *16551 nanos*
>
> One-way trip: Iterations: 1,110,000 | Avg Time: *8100.12 nanos* | Min
> Time: 6150 nanos | Max Time: 118035 nanos | 75%: 7966 nanos | 90%: 8010
> nanos | 99%: 8060 nanos | 99.999%: *8099 nanos*
>
> *Conclusions:* That's *19.674 versus 8.100* so ZeroMQ overhead on top of
> TCP is *142%* or *11.574 nanoseconds* !!! That's excessive. I would
> expect 1 microsecond overhead there.
>
> So questions remain:
>
>
> 1) What does ZeroMQ do under the rood that justifies so many extra clock
> cycles? (I am really curious to know)
>
> 2) Do people agree that 11 microseconds are just too much?
>
> My rough guess: ZeroMQ uses threads? (the beauty of NIO is that it is
> single-threaded, so there is always only one thread reading and writing to
> the network)
>
> -Julie
>
> On Wed, Aug 29, 2012 at 10:24 AM, Chuck Remes <lists at chuckremes.com>wrote:
>
>>
>> On Aug 29, 2012, at 10:13 AM, Julie Anderson wrote:
>>
>> Just tested ZeroMQ and Java NIO in the same machine.
>>
>> The results:
>> *
>> - ZeroMQ:*
>>
>> message size: 13 [B]
>> roundtrip count: 100000
>> average latency: *19.620* [us] *<====== ONE-WAY LATENCY*
>>
>> *- Java NIO Selector:* (EPoll)
>>
>> Average RTT (round-trip time) latency of a 13-byte message: 15.342 [us]
>> Min Time: 11.664 [us]
>> 99.999% percentile: *15.340* [us] *<====== RTT LATENCY*
>>
>> *Conclusion:* That's *39.240 versus 15.340* so ZeroMQ overhead on top of
>> TCP is *156%* or *23.900 nanoseconds* !!! That's excessive. I would
>> expect 1 or 2 microseconds there.
>>
>> So my questions are:
>>
>> 1) What does ZeroMQ do under the rood that justifies so many extra clock
>> cycles? (I am really curious to know)
>>
>> 2) Do people agree that 23 microseconds are just too much?
>>
>>
>> As a favor to me, please rerun the tests so that at least 1 million (10
>> million is better) messages are sent. This shouldn't take more than a few
>> minutes to run. Thanks.
>>
>> Secondly, are you using the local_lat and remote_lat programs that are
>> included with zeromq or did you write your own? If you wrote your own,
>> please share the code.
>>
>> Thirdly, a pastie containing the code for both tests so others could
>> independently reproduce your results would be very handy.
>>
>> cr
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20120829/78b2a5cf/attachment.htm>


More information about the zeromq-dev mailing list