[zeromq-dev] Async::Worker, C++ task offloading.
Oliver Smith
oliver at kfs.org
Thu Jul 22 18:31:41 CEST 2010
On 7/22/2010 4:10 AM, Martin Sustrik wrote:
>> It combines concepts from OpenMP and Intel's TBB.
>>
> I am not the expert on either of the two, but the idea of making 0MQ
> infrastructure look more friendly to OpenMP/TBB developers looks like an
> interesting idea. If you are interested in discussing it, comparing the
>
I'm also far from an expert on either, but we have been researching ways
to leverage multi-cores for our players out of a 10+ year old codebase
so over the last year I've been trying to squeeze in time to find a way
to migrate from monolithic single threading to parallelism and looked at
several approaches. OpenMP is considered "lightweight" because you
simply start by marking-up your code with #pragmas, making it a good
stepping stone.
Unfortunately, The parallelism is achieved by creating threads when you
reach enter a parallel region of code, which then spinlock/futex. So
ultimately, for a longer-term parallelization you're going to want to do
more.
Intel TBB has lots of pros and cons. Provides various useful templates,
classes and algorithms. See
http://www.threadingbuildingblocks.org/files/documentation/index.html.
There are both Open Source and Commercial licenses for Intel TBB
(http://www.threadingbuildingblocks.org/).
Ultimately, you wind up dealing with the minutae of parallelism, which
you can avoid by message passing. ZeroMQs advantage is going to be
scalability and the comparative simplicity of encapsulated message
passing, at a slight cost in performance: I would /strongly/ encourage
you to even minimally flesh out the zmq_queue, zmq_forwarder and
zmq_streamer documentation :)
> OpenMP/TBB/0MQ approaches, benchmarks etc. you can possibly write a blog
> about it to post on zeromq.org.
>
This reminded me of a discussion I started on TBBs forums, you might
like my last two posts on the thread:
http://software.intel.com/en-us/forums/showthread.php?t=73155&p=2&o=d&s=lr
<http://software.intel.com/en-us/forums/showthread.php?t=73155&p=2&o=d&s=lr>
Ha - reviewing my original post there, I can see the seeds of
Async::Worker :)
>> This is a somewhat weak example because the work being done by the
>> worker is so trivial, but even so on a virtual quad-core machine
>> building with -O0 I see a 35-40% reduction in processing time.
>>
> Wrker being trivial, the large reduction in processing time is even more
> impressive.
>
The great shame is that - by passing pointers - this first version would
/seem/ to preclude scalability across machines, but the very first thing
I wanted to pass was a { std::string ; std::vector ; }.
The workload I was going to perform on them wasn't very hefty, and I
thought "by the time I'm done creating a Worker and a message and
serializing the string and vector ... I've lost any gains". It's almost
like I need to offload the work of serializing the data to another local
thread...
I suspect you see where I'm going with that :)
The most obvious weak point in my current implementation is that I
failed to do zero-copy on the pointer itself! I need to figure out what
stupid thing I did wrong there because eliminating that extra allocation
would significantly improve throughput.
- Oliver
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20100722/ef12758d/attachment.htm>
More information about the zeromq-dev
mailing list