[zeromq-dev] Async::Worker, C++ task offloading.

Oliver Smith oliver at kfs.org
Thu Jul 22 18:31:41 CEST 2010


On 7/22/2010 4:10 AM, Martin Sustrik wrote:
>> It combines concepts from OpenMP and Intel's TBB.
>>      
> I am not the expert on either of the two, but the idea of making 0MQ
> infrastructure look more friendly to OpenMP/TBB developers looks like an
> interesting idea. If you are interested in discussing it, comparing the
>    
I'm also far from an expert on either, but we have been researching ways 
to leverage multi-cores for our players out of a 10+ year old codebase 
so over the last year I've been trying to squeeze in time to find a way 
to migrate from monolithic single threading to parallelism and looked at 
several approaches. OpenMP is considered "lightweight" because you 
simply start by marking-up your code with #pragmas, making it a good 
stepping stone.

Unfortunately, The parallelism is achieved by creating threads when you 
reach enter a parallel region of code, which then spinlock/futex. So 
ultimately, for a longer-term parallelization you're going to want to do 
more.

Intel TBB has lots of pros and cons. Provides various useful templates, 
classes and algorithms. See 
http://www.threadingbuildingblocks.org/files/documentation/index.html. 
There are both Open Source and Commercial licenses for Intel TBB 
(http://www.threadingbuildingblocks.org/).

Ultimately, you wind up dealing with the minutae of parallelism, which 
you can avoid by message passing. ZeroMQs advantage is going to be 
scalability and the comparative simplicity of encapsulated message 
passing, at a slight cost in performance: I would /strongly/ encourage 
you to even minimally flesh out the zmq_queue, zmq_forwarder and 
zmq_streamer documentation :)

> OpenMP/TBB/0MQ approaches, benchmarks etc. you can possibly write a blog
> about it to post on zeromq.org.
>    
This reminded me of a discussion I started on TBBs forums, you might 
like my last two posts on the thread: 
http://software.intel.com/en-us/forums/showthread.php?t=73155&p=2&o=d&s=lr 
<http://software.intel.com/en-us/forums/showthread.php?t=73155&p=2&o=d&s=lr>

Ha - reviewing my original post there, I can see the seeds of 
Async::Worker :)

>> This is a somewhat weak example because the work being done by the
>> worker is so trivial, but even so on a virtual quad-core machine
>> building with -O0 I see a 35-40% reduction in processing time.
>>      
> Wrker being trivial, the large reduction in processing time is even more
> impressive.
>    
The great shame is that - by passing pointers - this first version would 
/seem/ to preclude scalability across machines, but the very first thing 
I wanted to pass was a { std::string ; std::vector ; }.

The workload I was going to perform on them wasn't very hefty, and I 
thought "by the time I'm done creating a Worker and a message and 
serializing the string and vector ... I've lost any gains". It's almost 
like I need to offload the work of serializing the data to another local 
thread...

I suspect you see where I'm going with that :)

The most obvious weak point in my current implementation is that I 
failed to do zero-copy on the pointer itself! I need to figure out what 
stupid thing I did wrong there because eliminating that extra allocation 
would significantly improve throughput.

- Oliver

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20100722/ef12758d/attachment.htm>


More information about the zeromq-dev mailing list