[zeromq-dev] Completion events rather than ready events

Steven McCoy steven.mccoy at miru.hk
Thu Feb 4 09:31:26 CET 2010


On 29 January 2010 15:22, Steven McCoy <steven.mccoy at miru.hk> wrote:

> Off to research what Google has to say on Linux IOCP.
>
>
The research on Linux IOCP is interesting, but abruptly halted with a push
for POSIX AIO and edge-level detection in epoll to solve C10K.   AIO only
covers file access, network sockets are just passed through.

AIO is basically a thread pool waiting for IO commands to run in parallel to
the application, overlapped IO being when multiple operations can be running
on the same socket.  IOCP is defined as AIO with notification on IO
operation completion.  POSIX AIO can notify via POSIX signals or a callback,
Windows IOCP uses a file descriptor notification for direct integration with
an event loop.

The major difference is that Windows AIO supports WinSock handles, Linux AIO
doesn't handle network descriptors.  Discussion still ongoing for ideal
Linux direction:

http://people.redhat.com/drepper/newni-slides.pdf

I read this as Linux developers are saying the network stack is excellent
already and the only problem is with file handling, Microsoft are doing a
big spin with IOCP saying here's a fancy new technology that makes
everything faster.

So for ZeroMQ the send side is pretty much a re-implementation of txqueue
using high and low watermarks for flow control over a Nagle disabled TCP
port.  The API provides datagram semantics to allow for a TCP_CORK style
batching via zmq_flush.  There is no asynchronous completion notification of
when a message has been removed from the queue and onto the wire.  An
immediate question would be why re-implement something already working well,
what does ZeroMQ provide above the kernel queuing?

http://www.zeromq.org/whitepapers:design-v01#toc6

The answer is that ZeroMQ isn't queuing for the sake of queuing it's keeping
a backlog of messages that the kernel would block on.  When the backlog
builds up those messages can be transfered to the kernel in a larger send
call reducing the kernel switching overhead when the system is busy.

So does ZeroMQ optimize for the case that the kernel won't block on send and
there is no current backlog?  i.e. is there always the overhead of the
application thread notifying an IO thread to send to the kernel?

Comparing with AIO and IOCP ZeroMQ is providing a faster send call at the
expense of not knowing when the packet hit the wire.  Note that technically
with AIO and IOCP you only know when the packet enters the kernel you don't
know when it hits the wire, or if the packet gets discarded due to overrun.

On the receive side ZeroMQ will have an IO thread pulling messages as fast
as possible from the kernel in order not to drop packets, pretty standard
fair for a messaging middleware.  Using zmq_poll is in effect a late AIO
recv, the data is ready and waiting to be read, with the underlying system
automagically selecting the best event notification method for the platform.

So fast sends and fast receives, at what cost?  The cost is dedicated IO
threads to run the milk round between the queues and the kernel.  This would
likely to make running 16 high speed applications on a 16-core box not very
conducive.  To counter this ZeroMQ provides very high speed inter-process
and inter-thread communication transports using the same basic send and
receive API allowing developers to create more advanced applications that
can reduce the system administration overhead of running many different
processes.

<advert>
In certain architectures such as when all the applications are subscribing
to the same low-latency source it still will be preferable to use the
underlying transport such as OpenPGM directly as no additional threads are
required and no inter-thread communication is necessary.
</advert>

-- 
Steve-o
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20100204/148d0092/attachment.htm>


More information about the zeromq-dev mailing list