[zeromq-dev] Async::Worker, C++ task offloading.

Martin Sustrik sustrik at 250bpm.com
Sat Jul 24 09:38:26 CEST 2010


>>> It combines concepts from OpenMP and Intel's TBB.
>> I am not the expert on either of the two, but the idea of making 0MQ 
>> infrastructure look more friendly to OpenMP/TBB developers looks like an 
>> interesting idea. If you are interested in discussing it, comparing the 
> I'm also far from an expert on either, but we have been researching ways 
> to leverage multi-cores for our players out of a 10+ year old codebase 
> so over the last year I've been trying to squeeze in time to find a way 
> to migrate from monolithic single threading to parallelism and looked at 
> several approaches. OpenMP is considered "lightweight" because you 
> simply start by marking-up your code with #pragmas, making it a good 
> stepping stone.
> Unfortunately, The parallelism is achieved by creating threads when you 
> reach enter a parallel region of code, which then spinlock/futex. So 
> ultimately, for a longer-term parallelization you're going to want to do 
> more.
> Intel TBB has lots of pros and cons. Provides various useful templates, 
> classes and algorithms. See 
> http://www.threadingbuildingblocks.org/files/documentation/index.html. 
> There are both Open Source and Commercial licenses for Intel TBB 
> (http://www.threadingbuildingblocks.org/).
> Ultimately, you wind up dealing with the minutae of parallelism, which 
> you can avoid by message passing. ZeroMQs advantage is going to be 
> scalability and the comparative simplicity of encapsulated message 
> passing, at a slight cost in performance: I would /strongly/ encourage 
> you to even minimally flesh out the zmq_queue, zmq_forwarder and 
> zmq_streamer documentation :)
>> OpenMP/TBB/0MQ approaches, benchmarks etc. you can possibly write a blog 
>> about it to post on zeromq.org.
> This reminded me of a discussion I started on TBBs forums, you might 
> like my last two posts on the thread: 
> http://software.intel.com/en-us/forums/showthread.php?t=73155&p=2&o=d&s=lr 
> <http://software.intel.com/en-us/forums/showthread.php?t=73155&p=2&o=d&s=lr>
> Ha - reviewing my original post there, I can see the seeds of 
> Async::Worker :)

The above comparison is interesting. It would be good to have it 
accessible somewhere on the website. I'll give it a thought.

>>> This is a somewhat weak example because the work being done by the 
>>> worker is so trivial, but even so on a virtual quad-core machine 
>>> building with -O0 I see a 35-40% reduction in processing time.
>> Wrker being trivial, the large reduction in processing time is even more 
>> impressive.
> The great shame is that - by passing pointers - this first version would 
> /seem/ to preclude scalability across machines, but the very first thing 
> I wanted to pass was a { std::string ; std::vector ; }.

Actually, when using inproc:// transport 0MQ passes pointers between the 
threads under the hood. Yet you can trivially change it to tcp:// when 
scaling to multiple boxes. The only overhead is serialisation / 
deserialisation of your structures into the binary BLOB.

> The workload I was going to perform on them wasn't very hefty, and I 
> thought "by the time I'm done creating a Worker and a message and 
> serializing the string and vector ... I've lost any gains". It's almost 
> like I need to offload the work of serializing the data to another local 
> thread...
> I suspect you see where I'm going with that :)


> The most obvious weak point in my current implementation is that I 
> failed to do zero-copy on the pointer itself! I need to figure out what 
> stupid thing I did wrong there because eliminating that extra allocation 
> would significantly improve throughput.

I'm a bit lost here, what extra allocation? If you are passing just the 
pointer, it's 8 bytes (on 64-bit microarchs). Messages below 30 bytes of 
length are called VSMs (very small messages) in 0MQ and are passed 
*without* any extra memory allocations.


More information about the zeromq-dev mailing list