[zeromq-dev] Async::Worker, C++ task offloading.

Oliver Smith oliver at kfs.org
Sat Jul 24 09:51:45 CEST 2010

On 7/24/2010 2:38 AM, Martin Sustrik wrote:
>>> OpenMP/TBB/0MQ approaches, benchmarks etc. you can possibly write a blog
>>> about it to post on zeromq.org.
>> This reminded me of a discussion I started on TBBs forums, you might
>> like my last two posts on the thread:
>> http://software.intel.com/en-us/forums/showthread.php?t=73155&p=2&o=d&s=lr
>> <http://software.intel.com/en-us/forums/showthread.php?t=73155&p=2&o=d&s=lr>
>> Ha - reviewing my original post there, I can see the seeds of
>> Async::Worker :)
> The above comparison is interesting. It would be good to have it
> accessible somewhere on the website. I'll give it a thought.
I'll put some thought into a better write-up for you, something a little 
less "wary" of getting a TBB-forum thread deleted or something :)

>>>> This is a somewhat weak example because the work being done by the
>>>> worker is so trivial, but even so on a virtual quad-core machine
>>>> building with -O0 I see a 35-40% reduction in processing time.
>>> Wrker being trivial, the large reduction in processing time is even more
>>> impressive.
>> The great shame is that - by passing pointers - this first version would
>> /seem/ to preclude scalability across machines, but the very first thing
>> I wanted to pass was a { std::string ; std::vector ; }.
> Actually, when using inproc:// transport 0MQ passes pointers between the
> threads under the hood. Yet you can trivially change it to tcp:// when
> scaling to multiple boxes. The only overhead is serialisation /
> deserialisation of your structures into the binary BLOB.
I suspected as much but my instinct these days is usually to get 
something working before spending time under the hood. I'll probably do 
some reworking on Async::Worker over the weekend.
>> The most obvious weak point in my current implementation is that I
>> failed to do zero-copy on the pointer itself! I need to figure out what
>> stupid thing I did wrong there because eliminating that extra allocation
>> would significantly improve throughput.
> I'm a bit lost here, what extra allocation? If you are passing just the
> pointer, it's 8 bytes (on 64-bit microarchs). Messages below 30 bytes of
> length are called VSMs (very small messages) in 0MQ and are passed
> *without* any extra memory allocations.
Ahh! This I did not know.

In short: what I wanted to achieve was:

void sendMe(void* ptr)
   message_t msg(&ptr, sizeof(ptr)) ;
   outSocket.send(msg, 0) ;

but for some reason this failed horribly. I didn't take the time to 
investigate because, again, I wanted POC before going under the hood :) 
I had three overdue deadlines and needed parallelism a week ago and on 
top of this MySQL's Prepared Statement system was kicking me in the behind.

In hindsight, I suspect that I forgot to do the '&' in '&ptr' :)

- Oliver

More information about the zeromq-dev mailing list