[zeromq-dev] [***SPAM*** Score/Req: 08.1/8.0] Re: IPC (again)

Martin Sustrik sustrik at 250bpm.com
Mon Jan 4 12:48:37 CET 2010


Erik Rigtorp wrote:
> On Mon, Jan 4, 2010 at 09:43, Martin Sustrik <sustrik at 250bpm.com> wrote:
>> Hi Erik, John,
>>
>>>> I've read the two discussions on using ZeroMQ for IPC. I think ZeroMQ
>>>> should support IPC and in-process communication.
>>>>
>>> I think we all agree on this.
>>>
>>>> TCP is nice to work with but it has one problem: On linux (and others)
>>>> TCP over loopback doesn't bypass the TCP stack which makes the latency
>>>> several times higher than using pipes or unix domain sockets. I know
>>>> that on Solaris this is optimized so that a loopback TCP connection
>>> is that since a particular solaris release 8,9,10?
>>> I havent got my solaris internals book to hand right now ;-)
>>>
>>>> becomes more or less a pipe. For low latency IPC on Linux ZeroMQ needs
>>>> pipes or unix domain sockets.
>>>>
>>> just before xmas I exchanged an email with Martin about providing a fifo/pipe
>>> interface. (I wasnt concerned about performance, but wanted a zmq socket
>>> connection that could only be accessed via the same machine and not via
>>> loopback.) Subsequently I think that providing AF_LOCAL (AF_UNIX) sockets
>>> would be a good idea.
>>>
>>>> For ultra low latency IPC there is only one way to go and that is to
>>>> use shared memory. I took a look at yqueue.hpp in zeromq2 and it's a
>>>> good start. We only need to add a lock free memory allocator (which
>>> I'm glad some one else has looked at this because a while back I wondered
>>> whether the yqueue.hpp could use shared memory.
>>>
>>>
>>>> can be implemented using a lock free queue) or implement a lock free
>>> ypipe.hpp for example?
>>>
>>>> ringbuffer that would hold a fixed number of messages and block the
>>>> writer when it's full. For signaling I suggest to implement two
>>>> different approaches. One using pthreads conditions and one using busy
>>>> waiting. From my own testing I've seen that the pthreads
>>>> implementation would have similar latency as pipes/unix domain sockets
>>>> and a busy waiting solution would achieve latencies <1µs.
>> Great that there's an interest in IPC out there! Few comments follow:
>>
>> 1. pipes: Using pipes instead of TCP connections makes sense. It
>> requires no changes to the codebase starting from the point where the
>> connection is established. Still, we should think of a mechanism to be
>> used to pass the file descriptor of the pipe from connecting application
>> to the binding application. (Maybethis way: Open a TCP connection, pass
>> the fd as a message, close the TCP connection, use the pipe instead?)
> 
> Or use named pipes? They have the added benefit of being supported on
> windows too.

Yes. However, the main problem - i.e. passing the identity between 
processes remains. Consider a one-to-many scenario, such as 
client/server. Each client has to have a separate pipe. The clients are 
anonymous thus the name of the pipe has to be generated and passed to 
the peer somehow. This can be done either via TCP connection or via a 
shared named pipe with the restriction that the message notifying other 
party about creation of a new pipe is less then PIPE_BUF.

> 
>> 2. Yes, yqueue_t could use shared memory. It uses malloc for each N
>> elements (currently 256) in the queue and the size of the block
>> allocated is constant. As for multithreading there are two threads
>> accessing the yqueue, one of them writing to the queue (thus allocating
>> the chunks when needed) other one reading from the queue (thus
>> deallocating the chunks).
>>
>> 3. The above would work OK for VSMs (very small messages). Still, larger
>> message contents are allocated via malloc (see zmq_msg_init_size
>> implementation) and these would require allocating shmem for each
>> message. While doable, it would make sense only for very large messages,
>> and only those very large messages that are known in advance to be sent
>> via shmem transport. It's kind of complex.
> 
> That would be a neat optimization, but complex. I think as a start we
> should implement a ringbuffer with byte elements and use it as a
> shared memory pipe. Basically you would write() and read() from the
> buffer just like a socket but without the overhead. If you know the
> max message size you could optimize this and implement a ringbuffer
> where each element is a message and let the user program work directly
> on shared memory. That would be hard to integrate with ZeroMQs API.

What about passing just VSMs via the ringbuffer? You can increase 
MAX_VSM_SIZE when compiling 0MQ so that all the messages fit into the 
ringbuffer.

>> 4. Signalisation: Note that the receiver of the signals polls for
>> incoming signals using file descriptors. Thus the condition variable
>> won't do. Creating a fake file descriptor (always signaling) to
>> implement busy loop style of polling is viable, however, using 100% CPU
>> isn't exactly green. On Linux eventfd can be used to implement singaling
>> in efficient manner. Not sure about other OSes.
> 
> Yeah it won't be green, that's why two options should be provided. One
> with kernel signaling and one with busy looping. For some applications
> when extreme low latency is required the interrupt and scheduling
> latency is to high. In my application I can afford to let cores burn
> in order to get low latency. User space interrupts might be
> interesting.

Ack. There can be a new option for zmq_init that would force busy loop 
instead of polling.

>> As a summary, I would start with implementing the pipe transport, and
>> move to shmem when that part is done. Anyone interested in the task?
> 
> I'll try to find/write a good c++ lock-free ringbuffer template.

I would start with yqueue_t and ypipe_t. We've spent a lot of time 
making them as efficient as possible. The only thing needed is to split 
each of them into read & write part. This shouldn't be that complex. 
Both classes have variables accessed exclusively by reader and variables 
accessed exclusively by the writer. Then there are shared variables 
manipulated by atomic operations that should reside in the shared memory.

Martin



More information about the zeromq-dev mailing list