[zeromq-dev] RPC design pattern

Joe Holloway jholloway7 at gmail.com
Sun Apr 18 19:50:19 CEST 2010

On Sun, Apr 18, 2010 at 5:54 AM, Martin Sustrik <sustrik at 250bpm.com> wrote:
> Brian Candler wrote:
>> On Sun, Apr 18, 2010 at 02:28:04AM -0500, Joe Holloway wrote:
>>> I guess this is an age old problem of fault tolerant service
>>> discovery.   The service registry can't know whether or not a service
>>> is reachable until a client has tried to connect.
>> So maybe the solution is for the socket to have a list of endpoints to try,
>> instead of a single one?  It could just walk around the list until one
>> connects.  If you randomize this list first, then you get load-balancing
>> too.
>> The API docs don't make it clear what happens if you call zmq_connect
>> multiple times on the same socket.
> Yes, you pretty much right.
> You can connect your client to multiple instances of a service:
> zmq_connect (c, "tcp://svr001.example.com:5555");
> zmq_connect (c, "tcp://svr002.example.com:5555");
> zmq_connect (c, "tcp://svr003.example.com:5555");
> What happens is that the requests are load balanced among the servers.

OK, that part is starting to make more sense now.  I'm playing around
with putting my service definitions in SRV RR records (actually
DNS-SD, but same concept). The RFC [1] spells out a load balancing
algorithm for client programs based on the weight and priority of each
record.  It's not round robin, but similar.  The idea is that the
person registering the service knows the attributes of the hardware
that it's running on and can hint to the client agent that X node has
more capacity than Y node although they are both offering the same
service.  Just thinking out loud, but is there a way to provide those
hints to 0MQ or should we just trust that 0MQ is better at dynamic
load balancing than we could hope to be using deployment time hints?

> When one of them fails, up to ZMQ_HWM requests is queued for it and once
> the limit (HWM) is reached it no more request will be sent to that
> server. Once it gets back online queued requests will be sent to it and
> load-balancing starts to dispatch new requests to it automatically.

The part that confuses me about this is in a traditional RPC scenario
you may be calling services that have side effects or that lack
idempotence, per se.  You'd want the message to be handled by the
first reachable service provider and no others.  Unless I
misunderstand what you're saying, it seems counter-intuitive that
RPC-style messages would queue up rather than failover to alternate
instances or fail fast when there are no reachable instances?

I probably need to go back and read the whitepapers again now that I'm
becoming more familiar with the concepts.


[1] http://tools.ietf.org/html/rfc2782

More information about the zeromq-dev mailing list