[zeromq-dev] Async::Worker, C++ task offloading.

Oliver Smith oliver at kfs.org
Mon Jul 26 07:51:06 CEST 2010


On 7/22/2010 4:10 AM, Martin Sustrik wrote:
>> This is a somewhat weak example because the work being done by the
>> worker is so trivial, but even so on a virtual quad-core machine
>> building with -O0 I see a 35-40% reduction in processing time.
>>      
> Wrker being trivial, the large reduction in processing time is even more
> impressive.
>    
Just to follow up on that, I thought I'd post the findings of my 
benchmark comparisons of GCC vs the Intel C Compiler, they're kinda 
impressive:

Virtual Ubuntu 10.04 guest Machine running under VMWare 7.0 on an i7 
host under Windows 7 host, 2 virtual cpus with 2 cores each:

Async-Worker tests with GCC v4.4.3 with -O3 -msse -msse2 -msse3 -mssse3 
-msse4 -msse4.1 -msse4.2 -mfpmath=sse -mtune=core2 -march=core2:
(NOTE: I used Acovea to find these optimal settings, I wouldn't 
ordinarily use -mtune/-march because I always find they make things worse :)

     ~3580ms for serial RunAndReturn, ~3580 for serial RunAndReturnLocal,
     ~930ms for parallel RunAndReturn, ~940ms for parallel RunAndReturnLocal

Async-Worker tests with Intel C++ compiler 11.1 72 with -O3 -xHOST -ipo:

     ~2590ms for serial RunAndReturn, ~2580ms for serial 
RunAndReturnLocal, (27% gain)
     ~700ms for parallel RunAndReturn, ~700ms for parallel 
RunAndReturnLocal (25% gain)

Building ZeroMQ with "icpc -O3 -ipo -xHOST" instead of GCC shaved an 
extra 4-10ms off parallel results.

Building both Async::Worker examples and ZeroMQ with "icpc -O3 -ipo 
-xHOST -fbuiltin" reduces benchmark times by upto 50ms.

Async-Worker tests with Intel C++ compiler 11.1 72 with -O3 -xHOST -ipo 
-fbuiltin and ZeroMQ compiled with same flags:

     ~2510ms for serial RunAndReturn, ~2510ms for serial 
RunAndReturnLocal, (30% gain)
     ~640ms for parallel RunAndReturn, ~650ms for parallel 
RunAndReturnLocal (32% gain)

Given the trivial workloads, these are fairly impressive benchmarks.

The Intel C++ compiler is dual-licensed, you can download the Linux 
version free

http://software.intel.com/en-us/intel-compilers/

Compared to the Microsoft Visual C++ compiler (2008) we found between 
15-50% performance improvements. The 2010 VSCC is significantly 
improved, but Intel's compiler still produces 10-30% improvements.

You may be aware there was some controversy over the Intel compiler 
generating code that didn't work as well on AMD chips: This only 
occurred when you built "alternate code paths" for SSE instructions etc, 
and the (9.x) version of the compiler would tend not to use the 
alternate code paths unless you had an Intel compiler.

That option is now called "Build Intel specific optimizations", and the 
alternate code paths now applies fairly to any CPU that claims to have 
the feature set you are targetting.

- Oliver




More information about the zeromq-dev mailing list