[zeromq-dev] Multicore Magic

Brian Granger ellisonbg at gmail.com
Tue Apr 27 19:39:52 CEST 2010

>> Seems so unfair that processes are penalised for not being threads... :-)
> 1. That's the whole point of processes. They are isolated, and can't stomp
> on each other's memory space, intentionally or accidentally.

Yes, definitely.  The only problem with Python is that we can't use
threads even if we want to (for parallelism).

> 2. There's a big difference between latency and throughput. Many apps aren't
> particularly sensitive to the former.
> 3. For those apps which do care about a few extra microseconds latency, I
> suggest you're not going to want to write them in Python anyway :-)

Not quite.  Many "Python" applications in the HPC context are written
largely in C/C++/Fortran
and wrapped into Python.  Thus, they can be *extremely* fast and
microseconds can be
a large timescale.

On the other hand, I do agree with you that in many cases the latency
won't be a problem.

> Standard advice: build system first, measure performance, then if required
> identify hotspots and tune.  In many cases you'll discover the hotspots are
> not where you first thought they might be.

Unfortunately, for me this advice doesn't work.  Rather than trying to
solve a specific problem (where your logic would apply), I am involved
in the design and building of general approaches to parallelism in
Python.  Thus the "problem" I am working on is the set of all problems
that users will want to parallelize.  In this large set, there are
*many* latency sensitive cases.  The way I look at it is that the
latency sets the minimum granularity you can have an get good parallel
speedup.  Many algorithms require fine granularity and are thus
sensitive to latency.

BUT, the other part is scalability and this is where I am still
hopeful.  Maybe we will be stuck with horrible latency in Python
because of this issue.  But if that latency scales well (is constant)
with the total number of processes, we will be OK.  Traditional
threading with shared memory and locks, etc. is super fast....until
you have hundreds or thousands of threads all chomping on the locks.





Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu
ellisonbg at gmail.com

More information about the zeromq-dev mailing list