[zeromq-dev] Load balancing and fault tolerance
Martin Sustrik
sustrik at 250bpm.com
Tue Jul 13 15:06:59 CEST 2010
Brian,
> 1. The load balancing algorithm (round robin) of the XREQ socket is
> not efficient for certain types of loads. I think the recent
> discussions on other load balancing algorithms could help in this
> regard.
Yes. I think the roadmap is quite clear here.
> 2. Fault tolerance. If a worker goes down, I need to know about it
> and be able to requeue any tasks that went down with the worker. To
> monitor the health of workers, we have implemented a heartbeat
> mechanism that works great. BUT, with the current interface, I don't
> have any way of discovering which tasks were on the worker that went
> down. This is because the routing information (which client gets a
> message) is not exposed in the API.
I would say the whole retransmission part has to be inside of 0MQ rather
than implementing in on top of it.
I like to think about XREQ/XREP as an equivalent of IP in the Internet
stack. You send data. Presumably they get to the destination, however,
they may get lost occasionally.
Therefore, there's a reliability layer on top of it: REQ/REP (or TCP in
case of Internet stack) which monitors the replies (ACKs) and in case of
problem, manages the retransmission.
Martin
More information about the zeromq-dev
mailing list