[zeromq-dev] Load balancing and fault tolerance

Martin Sustrik sustrik at 250bpm.com
Tue Jul 13 15:06:59 CEST 2010


Brian,

> 1.  The load balancing algorithm (round robin) of the XREQ socket is
> not efficient for certain types of loads.  I think the recent
> discussions on other load balancing algorithms could help in this
> regard.

Yes. I think the roadmap is quite clear here.

> 2.  Fault tolerance.  If a worker goes down, I need to know about it
> and be able to requeue any tasks that went down with the worker.  To
> monitor the health of workers, we have implemented a heartbeat
> mechanism that works great.  BUT, with the current interface, I don't
> have any way of discovering which tasks were on the worker that went
> down.  This is because the routing information (which client gets a
> message) is not exposed in the API.

I would say the whole retransmission part has to be inside of 0MQ rather 
than implementing in on top of it.

I like to think about XREQ/XREP as an equivalent of IP in the Internet 
stack. You send data. Presumably they get to the destination, however, 
they may get lost occasionally.

Therefore, there's a reliability layer on top of it: REQ/REP (or TCP in 
case of Internet stack) which monitors the replies (ACKs) and in case of 
problem, manages the retransmission.

Martin



More information about the zeromq-dev mailing list