[zeromq-dev] [***SPAM*** Score/Req: 08.1/8.0] Re: IPC (again)

Erik Rigtorp erik at rigtorp.com
Mon Jan 4 10:43:20 CET 2010


On Mon, Jan 4, 2010 at 09:43, Martin Sustrik <sustrik at 250bpm.com> wrote:
> Hi Erik, John,
>
>>> I've read the two discussions on using ZeroMQ for IPC. I think ZeroMQ
>>> should support IPC and in-process communication.
>>>
>> I think we all agree on this.
>>
>>> TCP is nice to work with but it has one problem: On linux (and others)
>>> TCP over loopback doesn't bypass the TCP stack which makes the latency
>>> several times higher than using pipes or unix domain sockets. I know
>>> that on Solaris this is optimized so that a loopback TCP connection
>>
>> is that since a particular solaris release 8,9,10?
>> I havent got my solaris internals book to hand right now ;-)
>>
>>> becomes more or less a pipe. For low latency IPC on Linux ZeroMQ needs
>>> pipes or unix domain sockets.
>>>
>> just before xmas I exchanged an email with Martin about providing a fifo/pipe
>> interface. (I wasnt concerned about performance, but wanted a zmq socket
>> connection that could only be accessed via the same machine and not via
>> loopback.) Subsequently I think that providing AF_LOCAL (AF_UNIX) sockets
>> would be a good idea.
>>
>>> For ultra low latency IPC there is only one way to go and that is to
>>> use shared memory. I took a look at yqueue.hpp in zeromq2 and it's a
>>> good start. We only need to add a lock free memory allocator (which
>>
>> I'm glad some one else has looked at this because a while back I wondered
>> whether the yqueue.hpp could use shared memory.
>>
>>
>>> can be implemented using a lock free queue) or implement a lock free
>>
>> ypipe.hpp for example?
>>
>>> ringbuffer that would hold a fixed number of messages and block the
>>> writer when it's full. For signaling I suggest to implement two
>>> different approaches. One using pthreads conditions and one using busy
>>> waiting. From my own testing I've seen that the pthreads
>>> implementation would have similar latency as pipes/unix domain sockets
>>> and a busy waiting solution would achieve latencies <1µs.
>
> Great that there's an interest in IPC out there! Few comments follow:
>
> 1. pipes: Using pipes instead of TCP connections makes sense. It
> requires no changes to the codebase starting from the point where the
> connection is established. Still, we should think of a mechanism to be
> used to pass the file descriptor of the pipe from connecting application
> to the binding application. (Maybethis way: Open a TCP connection, pass
> the fd as a message, close the TCP connection, use the pipe instead?)

Or use named pipes? They have the added benefit of being supported on
windows too.

> 2. Yes, yqueue_t could use shared memory. It uses malloc for each N
> elements (currently 256) in the queue and the size of the block
> allocated is constant. As for multithreading there are two threads
> accessing the yqueue, one of them writing to the queue (thus allocating
> the chunks when needed) other one reading from the queue (thus
> deallocating the chunks).
>
> 3. The above would work OK for VSMs (very small messages). Still, larger
> message contents are allocated via malloc (see zmq_msg_init_size
> implementation) and these would require allocating shmem for each
> message. While doable, it would make sense only for very large messages,
> and only those very large messages that are known in advance to be sent
> via shmem transport. It's kind of complex.

That would be a neat optimization, but complex. I think as a start we
should implement a ringbuffer with byte elements and use it as a
shared memory pipe. Basically you would write() and read() from the
buffer just like a socket but without the overhead. If you know the
max message size you could optimize this and implement a ringbuffer
where each element is a message and let the user program work directly
on shared memory. That would be hard to integrate with ZeroMQs API.

>
> 4. Signalisation: Note that the receiver of the signals polls for
> incoming signals using file descriptors. Thus the condition variable
> won't do. Creating a fake file descriptor (always signaling) to
> implement busy loop style of polling is viable, however, using 100% CPU
> isn't exactly green. On Linux eventfd can be used to implement singaling
> in efficient manner. Not sure about other OSes.

Yeah it won't be green, that's why two options should be provided. One
with kernel signaling and one with busy looping. For some applications
when extreme low latency is required the interrupt and scheduling
latency is to high. In my application I can afford to let cores burn
in order to get low latency. User space interrupts might be
interesting.

> As a summary, I would start with implementing the pipe transport, and
> move to shmem when that part is done. Anyone interested in the task?

I'll try to find/write a good c++ lock-free ringbuffer template.



More information about the zeromq-dev mailing list