[zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)

Robert G. Jakabosky bobby at sharedrealm.com
Thu Aug 30 04:37:30 CEST 2012

On Wednesday 29, Stuart Brandt wrote:
> Not sure I want to step into the middle of this, but here we go. I'd be
> really hesitant to base any evaluation of ZMQ's suitability for a highly
> scalable low latency application on local_lat/remote_lat. They appear to
> be single threaded synchronous tests which seems very unlike the kinds
> of applications being discussed (esp. if you're using NIO). More
> realistic is a network connection getting slammed with lots of
> concurrent sends and recvs....which is where lots of mistakes can be
> made if you roll your own.

local_lat/remote_lat both have two threads (one for the application and one 
for IO).  So each request message goes from:
1. local_lat to IO thread
2. IO thread send to tcp socket
----- network stack.
3. recv from tcp socket in remote_lat's IO thread
4. from IO thread to remote_lat
5. remote_lat back to IO thread
6. IO thread send to tcp socket
----- network stack.
7. recv from tcp socket in local_lat's IO thread
8. IO thread to local_lat.

So each message has to pass between threads 4 times (1,4,5,8) and go across 
the tcp socket 2 times (2->3, 6->7).

I think it would be interesting to see how latency is effected when there are 
many clients sending requests to a server (with one or more worker threads).  
With ZeroMQ it is very easy to create a server with one or many worker threads 
and handle many thousands of clients.  Doing the same without ZeroMQ is 
possible, but requires writing a lot more code.  But then writing it yourself 
will allow you to optimize it to your needs (latency vs throughput).

> As a purely academic discussion, though, I've uploaded raw C socket
> versions of a client and server that can be used to mimic local_lat and
> remote_lat -- at least for TCP sockets. On my MacBook, I get ~18
> microseconds per 40 byte packet across a test of 1000000 packets on
> local loopback. This is indeed about half of what I get with
> local_lat/remote_lat on tcp://
>    http://pastebin.com/4SSKbAgx   (echoloopcli.c)
>    http://pastebin.com/rkc6itTg  (echoloopsrv.c)
> There's probably some amount of slop/unfairness in there since I cut a
> lot of corners, so if folks want to pursue the comparison further, I'm
> more than willing to bring it closer to apples-to-apples.

echoloop*.c is testing throughput not latency, since it sends all messages at 
once instead of sending one message and waiting for it to return before 
sending the next message.  Try comparing it with local_thr/remote_thr.

Robert G. Jakabosky

More information about the zeromq-dev mailing list