[zeromq-dev] RPC design pattern

Martin Sustrik sustrik at 250bpm.com
Sun Apr 25 15:50:38 CEST 2010


It's great you've started to research the problem! It's a complex one 
and will take a while to get right but I think the direction is 
generally right.

See my comments inlined.

>> If we knew the weights in advance (as specified with RFC 2782) we can in
>> theory write a load-balancing that would hit the queue limits less
>> often. However:
>> 1. Do we even care?
>> 2. User has to manually set the weights = work = increased cost.
>> 3. There's no guarantee that user will set the weights so that they
>> reflect the reality.
> I agree with your points and the authors of that RFC even noted that
> setting those parameters requires a bit of a gentlemen's contract.
> I'm asking here mainly so if I publish my investigation in this space
> I can articulate what is or isn't possible.  Here I would just be able
> to say RFC 2782 is potentially a nice solution for doing service
> discovery but you'd have to ignore load balancing hints in the DNS
> records to get the most out of 0MQ.

It's definitely possible. Of course it needs some implementation work 
that may not be worth of doing. Thus if I were you, I would say that 
honouring the weights is possible (actually it's pretty trivial), 
however, assumption of having competent and gentlemanly user may be a 
wishful thinking rather than reality. Using TCP pushback to stop sending 
messages to the services that are busy would solve the problem without 
relying on user's skills or conduct.

>>> The part that confuses me about this is in a traditional RPC scenario
>>> you may be calling services that have side effects or that lack
>>> idempotence, per se.  You'd want the message to be handled by the
>>> first reachable service provider and no others.  Unless I
>>> misunderstand what you're saying, it seems counter-intuitive that
>>> RPC-style messages would queue up rather than failover to alternate
>>> instances or fail fast when there are no reachable instances?
>> The question is: How do you know the instance is reachable. You can
>> never tell before you actually get a reply from it. To get a reply you
>> have to send a request. That means that at least 1 message (the request)
>> is "queued" for delivery to the service with unknown availability.
>> There's no way to avoid that.
> I suppose you have the same problem if you are trying to implement
> RPC-style transactions on top of any queuing technology?  I think I
> still get confused because the API appears to be socket-oriented as if
> you are making a direct socket connection, but then there's really a
> queuing layer in the middle.

The problem exists in any RPC solution whether there's explicit queue or 

The issue is that you never know whether a service is available until it 
responds to your request. Thus there's always at least one request "in 
the air". The same would happen with simple TCP connection.

 > As you noted earlier, a timeout would be
 > a good solution as long as there's no ambiguity as to whether or not
 > the message was processed.

Actually, there's no other way to detect the peer is not available aside 
of timing out. Berkeley sockets try to hide the fact from the user but 
that's really the case.

As for knowing whether message was processed, you can never be 100% 
sure. The highest reliability guarantee (a.k.a. guaranteed delivery) 
does provide something similar but it's not completely bullet-proof. It 
can say whether the message was processed or that there's was a problem. 
If there was a problem it means that we don't know whether message was 
processed or not.

You can't get system more reliable than that.

>> Please, keep the list posted about your experience with 0MQ and RFC
>> 2728. The matter is of critical importance for further development of
>> 0MQ and the more experience we as a community can gather about
>> addressing and naming issues, the better.
> Specifically as it relates to RFC 2782, there are a couple items off
> the top of my head that could be addressed (just brainstorming):
> 1) A translation scheme to/from RFC 2782 style names (e.g.
> _zeromq._tcp.example.com 5555) into 0MQ style address strings  (e.g.
> tcp://example.com:5555)


> 2) Pros and cons of static registration (RFC 2782) vs. dynamic
> registration (DNS-SD).

I would say the static registration should be strongly preferred. The 
rationale is that you want to know location an entity even though it may 
not be running/online at the moment. In such a case messages can be 
queued and sent once it gets available.

> 3) Can a client detect that a service is local and switch over to
> inproc/ipc transport for optimization (or does the 0MQ kernel already
> attempt this?)

That's an interesting question. No, 0MQ does not do that at the moment. 
But it would be nice if it could. How should it be done? Once again, 
more research is needed.

> Others?
> As you can probably tell from the above discussion, I'm coming at this
> mainly from the perspective of RPC over TCP style messaging since
> that's what I'm most familiar with.  I haven't really given much
> thought to other messaging patterns or broadcast protocols.

Yup. However, even RPC encounters similar problems as different 
messaging patterns. They are just not as visible with RPC as with say 
data distribution.


More information about the zeromq-dev mailing list