[zeromq-dev] "Threadless" version

Erik Rigtorp erik at rigtorp.com
Tue Jan 12 22:01:18 CET 2010


I have some more comments regarding zeromq2. The code seems to be
highly optimized for message throughput. As i understand basically the
application thread puts stuff in the io threads queue and vice versa.
This is excellent for throughput as the io thread can keep on
preparing messages for delivery to the application thread while the
application thread is working and the io thread can wait for slow tcp
receivers without blocking.

The problem is that thread context switching is expensive. I did some
testing on Solaris running the latency benchmark from zeromq2 git repo
and got latencies around 35-40µs between two processes on the same
machine. The loopback tcp latency on this machine is 10µs. So zeromq
adds a significant overhead here. It's not hard see why: the
application to io thread signalling will add 5-10µs in latency. Maybe
it's possible to add support for having the application thread itself
write to the sockets and read from them. This would reduce latencies
at the cost of throughput put. For some applications this could be
important. It could even be fine to disconnect any sockets who returns
-EAGAIN, ie slow receivers should pull their orders from the market
anyway since they have stale data.

29wests solution is similar to zeromq, they seem to run a configurable
amount of io threads in the background. But judging from some of the
latency data they present (20µs end to end using openonload) it's
seems like they also have a possibility for the app thread to write to
the sockets directly. They also have a drop slow TCP receivers mode
which seems to corroborate this.

One reason I'm bringing this up is also because of our discussion on
shared memory. Shared memory won't give much gains except for very
large messages unless it's possible to bypass the io thread.


More information about the zeromq-dev mailing list