[zeromq-dev] Load distribution vs fail over

Martin Sustrik sustrik at 250bpm.com
Fri Mar 5 09:38:06 CET 2010


gonzalo diethelm wrote:
> The butterfly tutorial is a good example of using 0MQ's capabilities for
> load balancing using UPSTREAM / DOWNSTREAM sockets: you have one process
> S that binds on a socket and N identical processes Ri that connect to
> that socket; then the S process starts sending messages to the Ri
> processes, distributing them in a round-robin fashion among all Ri. You
> might call this "active-active load distribution" (because all the Ri
> processes are actively processing messages simultaneously).
Hi Gonzalo,

> I am wondering how one would build a variation of this design, where the
> S process sends ALL messages to a single Ri (R0, let's say). All the
> other Ri processes are there just in case R0 fails; when S detects that
> R0 is not receiving any messages, it switches to sending them to R1; and
> so on. You might call this "active-passive load distribution" (because
> only one Ri process is actively processing messages, while all the other
> processes are passively waiting to become the active one). One reason it
> might be advisable to process all messages with just one process is
> concurrency problems; if all Ri processes access the same database,
> running them simultaneosly might lead to contention, locking and even
> deadlocking.
> 
> How would you approach this second scenario? One way I thought of was
> using a PUB socket in S and SUB sockets in each Ri, so that all of them
> receive all messages. The Ri processes also send a heart-beat message
> between themselves, and have a way of deciding which one is the current
> active process (such as voting). However, this is complex and forces
> each Ri to know about all other instances.
> 
> Would it be possible to have a socket option (or type) that tells 0MQ
> not to round-robin among all Ri processes, but stick to one process
> only, and make the switch to the next one on a failure condition? Of
> course, you would have to define a policy for declaring such a failure.
> 
> The failure condition also made me wonder: is there any way to control
> the queueing policy in 0MQ? Can I know how many messages are waiting in
> a queue to be delivered? Or how much "space" the queue has left? Can I
> control these values somehow?

Yes. You are right. The feature you are describing is absolutely needed. 
It's on the roadmap, but as always, the share amount of stuff on the 
roadmap... Anyway, here are my comments:

Failover as described in your scenario (but needed in other scenarios as 
well - just think of multihoming) should be available for both 
UPSTREAM/DOWNSTEAM and REQ/REP sockets.

It should kick in if primary service is disconnected, dead or busy.

Once the primary service is ready to process messages the system should 
fail back automatically.

The user should be able to combine failover with load balancing. I.e. 
load balance among instances R1..R10. When none is available, fail over 
to R11.

There should be multiple failover levels: If R1 is not available fail 
over to R2. If even R2 is not available fail over to R3.

The current idea is to introduce ZMQ_PRIORITY socket option:

socket_t s (ctx, ZMQ_DOWNSTREAM);
s.setsockopt (ZMQ_PRIORITY, 1);
s.bind ("tcp://eth0:5555");
s.setsockopt (ZMQ_PRIORITY, 2);
s.bind ("tcp://eth0:5556");

This way load balancing would be done among all the peers connected to 
port 5555. If none of them is available, the system will fail over to 
peers connected to port 5556 and load balance among them.

One of the important applications of such mechanism is using local 
instances (say those located in the same datacenter) if possible and 
failing over to remote instances (say those located in a distant 
datacenter connected by slow satellite link) in case no local services 
are available.

As for the condition for fail-over, queue limits (ZMQ_HWM) should be 
used. These still weren't ported from 0MQ/1.0, but as I am said the code 
is ready and will be commited shortly.

Martin



More information about the zeromq-dev mailing list