[zeromq-dev] Parallel sort
Oliver Smith
oliver at kfs.org
Tue Aug 3 01:59:43 CEST 2010
On 8/2/2010 5:17 PM, Pieter Hintjens wrote:
>> Not infinite :) I think the issue is one of collection; step X creates N
>> units of work that require processing and then resumption...
>>
> You can plug pipelines into themselves.
>
> I'm wondering where you store the data while it's being sorted. I'm
> also wondering why you need to parallelize this. Is the comparison
> particularly expensive?
>
> Before discussing solutions, could you explain more about the use case?
>
It's in the Subject line :)
I've been thinking about this a lot today (how you might sub-division
parallel work into sub-tasks and then perform a subsequent
reunification). Fundamentally, it is the same principle as packet
fragmentation and reassembly.
In this case, it is a message (containing a unit of work) being divided
into 2 or more subunits. Each of which can potentially be executed by a
separate thread before finally sending an "ack" to the joiner process.
Here's my theoretical model:
INITIATOR:
Initiator dispatches the meta-workload to the worker pool with a static
ID/token that is recognized as meaning "first workload, singular".
WORKERS:
The first worker receives this load, performs any pre-processing and
then divides it into discrete sub units.
It forwards the sub-units via multi-part messages to a third thread, the
last part of the multi-part message is mostly a copy of the source
message that triggered this step (for callback purposes). It can't be an
exact copy otherwise ... you'd just create a loop :) Perhaps the "token"
contains a "stage" so that this could also serve as a message-passing FSM.
COLLATOR:
The collection thread rejoins the multi-part message and assigns each
piece of work a token (akin to a segment id). It uses the last part of
the message as a "completion" reference for where to send the completed
results, and to obtain "depth" information on the recursion.
It then forwards the subunits back to the worker pool.
WORKERS:
Repeat the above process until they actually complete a piece of work.
When this occurs, they send what amounts to an "ACK" for the source
token to the COLLATOR.
COLLATOR:
On receiving an ACK for a token, it removes that token from the
in-flight queue. If this results in no ACKs pending for a given piece of
work, the "completion" message is forwarded to the worker pool (again,
this contains the token from which the subunits were created).
WORKERS:
A worker receives the completion message for a group of subunits and
performs the necessary post-processing. It then ACKs the token in the
completion message (i.e. the work unit a level above).
COLLATOR:
When an ACK is received for the parent work, it is forwarded to the
originating thread, which is presumably sitting in a recv().
This /could/ have some embellishments. Such as the 'stage' component of
the token descriptor.
Also, the COLLATOR could set limits on the number of tokens in flight
(to increase efficiency and cache hotness, etc). Hence the need to track
depth; if you set a limit of, say, 4 tokens, then you would become
"stuck" at the 3rd level of recursion due to exhausting the number of
tokens available.
Again this is akin to a networking concept: bandwidth. You would
presumably want to limit the number of tokens in-flight at any given
depth to, say, twice the number of worker threads available (to ensure
that there is always work ready when a thread calls recv()).
I think this should probably be controlled at the COLLATOR level,
though. C.f.
Worker #1 determines that it has 64 sub-units to dispatch. If it blocks
at a send() then one of your workers is out of action.
Alternatively, if worker #1 can fire off all 64 sub-units into the
COLLATORs input, it will free itself to return to the worker pool.
I guess the COLLATOR would have to use poll() so that it can receive
ACKs independently of work issues. That would interfere with the depth
handling I mentioned unless it either used distinct sockets per depth
level... Which sounds excessive but you are already using the
code/memory involved so it has a huge head start over any additional
second-tier queueing system inside the collator itself...
- Oliver
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20100802/d8d203b1/attachment.htm>
More information about the zeromq-dev
mailing list