[zeromq-dev] 0MQ/2.0-alpha3 available
Martin Sustrik
sustrik at fastmq.com
Wed Sep 30 17:05:31 CEST 2009
Hi Aamir,
Thanks for the analysis. You've raised a couple of interesting points.
See my comments inlined.
>> 1. Do we want to route the replies back to the sender or do we just
>> want to propagate them further on? In former case some kind of
>> route-back mechanism mas to be involved.
>>
>
> I would say that we do want the ability to route back replies ... I
> imagine that many people are forced to "reinvent" this wheel right
> now.
For starters I would ignore the replies. Let's not scratch what does not
itch. The streamlining is complex enough by itself anyway.
> In practice, an application is probably not going to be as simple as
> sending replies back to the sender. In a typical grid application,
> there is a cluster of a "worker nodes" that are "always on" and are
> always listening for requests. On the other hand, the client program
> (which sends requests to the worker nodes) is not "always on" ... it
> can connect, disconnect, and reconnect at different times. So we have
> a problem with the application start-up sequence ... the workers
> must know where the client is, but the client may not be online when
> the worker nodes are being started up. So I think an intermediate
> "forwarded" is always required. The worker nodes connect to the
> forwarder. The worker nodes receive requests from the forwarder and
> send replies to the forwarder. The forwarder must collect requests
> from a client, load-balance those requests, and send replies back to
> the client.
Right. That's the idea with 0MQ "devices". It seems there's a kind of
symmetry here. Each scenario requires a specific device. Pub/sub
requires "forwarder" that would get the data from remote network and
redistribute them on the local network. Simple req/rep requires a
"shared queue" that stores and load-balances the requests and routes
replies back to the requester. Streamlining requires the device you call
"forwarder" - it's actually a demux/mux device - we should find some
suitable name for it.
> The forwarder device must be able to deal with the situation where a
> client aborts in the middle of a computation ... typically some kind
> of clean-up is required, so that the client can cleanly reconnect and
> resume where it left off. This adds some complexity to the forwarder
> device. If the client abort a calculation and then reconnects then
> the forwarder has to be some way of preventing "stale" messages from
> being sent to the client and the worker nodes ... here "stale"
> message is defined as any request or reply that was sent before the
> calculation was aborted. Furthermore, if we allow for more than one
> client operating at any given time then the forwarder becomes a more
> complex device. Now the forwarder must sort out which replies go to
> which client, which keeping in mind that clients can appear and
> disappear from anywhere at any time.
I'll draw a picture of such a system so that we can agree on a
terminology. At the moment I'm a bit confused (client/worker etc.)
> But I think that if we had a way to asynchronously route-back replies
> then that would be sufficient to implement a forwarder device by
> simply combing two "async requesters" (in in the forwarder device and
> one in the client program).
Yup. I would say so.
>> 2. Do we want to reorder replies so that sequence of requests is
>> matched? If so, message numbering should be added.
>>
>
> Ideally, the would replies would be reorder. In fact in my own code I
> currently I have to add message numbers for exactly this reason.
Reordering seems to be a good thing. One of the benefits is getting the
ordering issue caused by failed worker components to be fixed:
1. Send REQ1 to worker 1
2. Send REQ2 to worker 2
3. Send REQ3 to worker 1
4. worker 2 fails
5. REQ2 is send to worker 1 (failover)
6. Sequence of responses generated by worker 1 is: REP1, REP3, REP2 :(
On the other hand, reordering causes latency. When message X is missing
all the subsequent messages cannot move downstream till X arrives.
> How would the 0MQ library add message numbering? I guess message
> headers have to be used?
Yes. 4 initial bytes of the message with RFC1982 wrap-over or something
like that.
> Besides the sequence of the of messages, there is also the issue of
> the number of messages. Simply put, if we sent 100 requests and
> receive back 101 replies then we have a serious problem. I don't know
> if whether library should check this or the user code.
Hm. Not sure about this. Worker component would probably require two
distinct sockets (in & out) and there's no way to associate the two as
for now.
>> My intuition is that there we are probably having two distinct use
>> cases here, one for streamlining messages forth, the other one for
>> routing the replies back to the requester ("anyc requester").
>>
> I tend to agree, although it's possible that someone using the
> pipelining model would also also have a need for eventually having
> replies routed back to the requester once the the request has been
> processed through the pipeline. I would suggest that the two use
> cases can be treated separately and then later made compatible with
> each other if we want a fully generalized "butterfly" solution.
Right.
Martin
More information about the zeromq-dev
mailing list