[zeromq-dev] Load balancing and fault tolerance

Brian Granger ellisonbg at gmail.com
Tue Jul 13 18:40:10 CEST 2010


Matt,

On Tue, Jul 13, 2010 at 6:06 AM, Matt Weinstein
<matt_weinstein at yahoo.com> wrote:
> (( Sorry, I was typing on an iPad. In a moving car. Heading for a train. :-)
> ))
>
> Originally, I figured it would be easy to just use an XREP directly, and
> interpret and dispatch, but that's no fun.
>
> Another way to do it might be to cheat, and use two XREPs back to back, i.e.
> SERVICE THREADS emit a REQUEST when they start up.  This identifies them to
> your specially crafted DEVICE, which places them in a FIFO queue.
>
> When a real CLIENT REQUEST comes in from the other side, you pull the next
> available SERVICE THREAD uuid off the FIFO, and feed the CLIENT REQUEST to
> the SERVICE THREAD (as a response!).  Now you know how is handling the
> request, this information can be treasured away.
>
> When the response comes back from the SERVICE THREAD, the response is
> forwarded to the CLIENT, and the SERVICE THREAD's UUID is put back in the
> FIFO.

This is exactly how our current non-0MQ using approach works.  The
problem we have with it is that the un-handled CLIENT REQUESTS are all
stuck in the central "device".  Some of our CLIENT REQUESTS can be
huge (500 MB) and there can be many of them.  Then there is the
latency.  Because the main queue of CLIENT REQUESTS is in the device,
you have to pay the round trip latency time for a SERVICE THREAD to
get a new one to work on.

One of the really nice things about 0MQ and the traditional XREQ/XREP
setup is that the device keeps forwarding the CLIENT REQUESTS to the
XREQ socket as fast as it can.  0MQ transfers the messages to the
XREP, where they sit in the 0MQ queue of the XREP socket.  The result
is that when the SERVICE THREAD is ready to work on a new REQUEST, the
REQUEST is already in its local queue and can be retrieved with 0
latency.  Second, all of the REQUESTS are now queued on the workers
(in the XREP 0MQ queue), so the central device doesn't suffer under
the weight of all of these.  The difficulty is that the unhandled
requests are sitting in a location that I can't discover (the XREP
queues on the workers) and I don't know of any way of getting this
information.

Your idea should work for situations where latency and device load are
not as big of an issue.

> If there is a TIMEOUT (use the 3rd socket approach, per prior email threads)
> another SERVICE THREAD is pulled off the FIFO, and the CLIENT REQUEST is
> re-sent to the new SERVICE THREAD.  State is recorded to make sure the
> request is only responded to once.

Yes, this would cover the fault tolerance.

> You can only poll on the inbound socket (CLIENT REQUEST side) if you
> actually have SERVICE THREADS in the FIFO.  Fortunately, you are single
> threaded, so you will hear about that from the other socket shortly :-)
>
> If you need some type of buffering, you can add another buffering DEVICE
> (ZMQ_QUEUE) cross connected with your special device and two ZMQ_STREAMs.

Is this between the client and device or the device and the workers.
Not quite clear on this part of it.

> As an added plus, I believe the request UIDs can be assigned at this layer,
> so you don't even need to track them at the CLIENT side.

Yes, I think so, but even if we had to track them in the client it
wouldn't be that big of a deal.

> I believe this maintains all of the socket state correctly, but this should
> be walked through carefully.

I think it would work.  We may try this even though the latency and
device memory footprint issues are still there.

> That's the sketch anyway.
>
> If this works, I'll probably use it too :-)

Thanks for the ideas...

Cheers,

Brian


> Comments welcome ...
>
> Best,
>
> Matt
>
>
> On Jul 13, 2010, at 7:28 AM, Brian Granger wrote:
>
>> On Tue, Jul 13, 2010 at 3:39 AM, Matt Weinstein
>> <matt_weinstein at yahoo.com> wrote:
>>>
>>> The XREQ / XREP protocol carries the GUID of the requester in the header
>>> part.
>>>
>>> client -->  [ REQ ] --- [ XREP ]  === (GUID of client available here) ==
>>>  [
>>> XREQ ] [ REP ]  your workers here
>>>
>>> I'm using it to track timeouts.
>>>
>>> Does that help?
>>
>> Not really.  We need the ability to track the location of each message
>> in the worker pool.  The GUIDs track which client submitted the
>> message.
>>
>> Brian
>>
>>> On Jul 13, 2010, at 3:41 AM, Brian Granger wrote:
>>>
>>>> Hi,
>>>>
>>>> One of the core parts of a system I am building with zeromq is a task
>>>> distribution system.  The system has 1 master process and N workers.
>>>> I would like to use XREP/XREQ sockets for this, but am running into
>>>> two problems:
>>>>
>>>> 1.  The load balancing algorithm (round robin) of the XREQ socket is
>>>> not efficient for certain types of loads.  I think the recent
>>>> discussions on other load balancing algorithms could help in this
>>>> regard.
>>>>
>>>> 2.  Fault tolerance.  If a worker goes down, I need to know about it
>>>> and be able to requeue any tasks that went down with the worker.  To
>>>> monitor the health of workers, we have implemented a heartbeat
>>>> mechanism that works great.  BUT, with the current interface, I don't
>>>> have any way of discovering which tasks were on the worker that went
>>>> down.  This is because the routing information (which client gets a
>>>> message) is not exposed in the API.
>>>>
>>>> Because of this second limitation, we are having to look at crazy
>>>> things like using a PAIR socket for each worker.  This is quite a pain
>>>> because I can't use the 0MQ load balancing and I have to manage all of
>>>> those PAIR sockets (1 for each worker).  With lots of workers this
>>>> doesn't scale very well.
>>>>
>>>> Any thought on how to make XREQ/XREP more fault tolerant?
>>>>
>>>> Cheers,
>>>>
>>>> Brian
>>>>
>>>> --
>>>> Brian E. Granger, Ph.D.
>>>> Assistant Professor of Physics
>>>> Cal Poly State University, San Luis Obispo
>>>> bgranger at calpoly.edu
>>>> ellisonbg at gmail.com
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>>
>>
>>
>>
>> --
>> Brian E. Granger, Ph.D.
>> Assistant Professor of Physics
>> Cal Poly State University, San Luis Obispo
>> bgranger at calpoly.edu
>> ellisonbg at gmail.com
>
>



-- 
Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu
ellisonbg at gmail.com



More information about the zeromq-dev mailing list