[zeromq-dev] Load balancing and fault tolerance

Matt Weinstein matt_weinstein at yahoo.com
Tue Jul 13 20:27:10 CEST 2010


On Jul 13, 2010, at 12:40 PM, Brian Granger wrote:

> Matt,
>
> On Tue, Jul 13, 2010 at 6:06 AM, Matt Weinstein
> <matt_weinstein at yahoo.com> wrote:
>> (( Sorry, I was typing on an iPad. In a moving car. Heading for a  
>> train. :-)
>> ))
>>
>> Originally, I figured it would be easy to just use an XREP  
>> directly, and
>> interpret and dispatch, but that's no fun.
>>
>> Another way to do it might be to cheat, and use two XREPs back to  
>> back, i.e.
>> SERVICE THREADS emit a REQUEST when they start up.  This identifies  
>> them to
>> your specially crafted DEVICE, which places them in a FIFO queue.
>>
>> When a real CLIENT REQUEST comes in from the other side, you pull  
>> the next
>> available SERVICE THREAD uuid off the FIFO, and feed the CLIENT  
>> REQUEST to
>> the SERVICE THREAD (as a response!).  Now you know how is handling  
>> the
>> request, this information can be treasured away.
>>
>> When the response comes back from the SERVICE THREAD, the response is
>> forwarded to the CLIENT, and the SERVICE THREAD's UUID is put back  
>> in the
>> FIFO.
>
> This is exactly how our current non-0MQ using approach works.  The
> problem we have with it is that the un-handled CLIENT REQUESTS are all
> stuck in the central "device".  Some of our CLIENT REQUESTS can be
> huge (500 MB) and there can be many of them.  Then there is the
> latency.  Because the main queue of CLIENT REQUESTS is in the device,
> you have to pay the round trip latency time for a SERVICE THREAD to
> get a new one to work on.
>

Yes, I guess this would be a "pessimistic" approach.  I need a bit  
more info about your topology to optimize this -- do you have a  
drawing? ;-)

> One of the really nice things about 0MQ and the traditional XREQ/XREP
> setup is that the device keeps forwarding the CLIENT REQUESTS to the
> XREQ socket as fast as it can.  0MQ transfers the messages to the
> XREP, where they sit in the 0MQ queue of the XREP socket.  The result
> is that when the SERVICE THREAD is ready to work on a new REQUEST, the
> REQUEST is already in its local queue and can be retrieved with 0
> latency.  Second, all of the REQUESTS are now queued on the workers
> (in the XREP 0MQ queue), so the central device doesn't suffer under
> the weight of all of these.  The difficulty is that the unhandled
> requests are sitting in a location that I can't discover (the XREP
> queues on the workers) and I don't know of any way of getting this
> information.
>

Well, you could always continue to read the requests out of the  
inbound pipe and put them in a queue, but I thought ypipes were pretty  
optimal for this...?

> Your idea should work for situations where latency and device load are
> not as big of an issue.
>
>> If there is a TIMEOUT (use the 3rd socket approach, per prior email  
>> threads)
>> another SERVICE THREAD is pulled off the FIFO, and the CLIENT  
>> REQUEST is
>> re-sent to the new SERVICE THREAD.  State is recorded to make sure  
>> the
>> request is only responded to once.
>
> Yes, this would cover the fault tolerance.
>
>> You can only poll on the inbound socket (CLIENT REQUEST side) if you
>> actually have SERVICE THREADS in the FIFO.  Fortunately, you are  
>> single
>> threaded, so you will hear about that from the other socket  
>> shortly :-)
>>
>> If you need some type of buffering, you can add another buffering  
>> DEVICE
>> (ZMQ_QUEUE) cross connected with your special device and two  
>> ZMQ_STREAMs.
>
> Is this between the client and device or the device and the workers.
> Not quite clear on this part of it.
>

I was thinking that the DISPATCHER would be inproc: with the service  
threads, I forgot about ZMQ_PAIR which might work fine:

	CLIENTS ]]] [ XREP ] [ ZMQ_STREAM ] [ ZMQ_PAIR ] <<= [ your network  
here ] =>> [ ZMQ_PAIR ] [ DISPATCHER ] [ XREP ] [[[ SERVICE THREADS

>> As an added plus, I believe the request UIDs can be assigned at  
>> this layer,
>> so you don't even need to track them at the CLIENT side.
>
> Yes, I think so, but even if we had to track them in the client it
> wouldn't be that big of a deal.
>
>> I believe this maintains all of the socket state correctly, but  
>> this should
>> be walked through carefully.
>
> I think it would work.  We may try this even though the latency and
> device memory footprint issues are still there.
>
>> That's the sketch anyway.
>>
>> If this works, I'll probably use it too :-)
>
> Thanks for the ideas...
>
> Cheers,
>
> Brian
>
>
>> Comments welcome ...
>>
>> Best,
>>
>> Matt
>>
>>
>> On Jul 13, 2010, at 7:28 AM, Brian Granger wrote:
>>
>>> On Tue, Jul 13, 2010 at 3:39 AM, Matt Weinstein
>>> <matt_weinstein at yahoo.com> wrote:
>>>>
>>>> The XREQ / XREP protocol carries the GUID of the requester in the  
>>>> header
>>>> part.
>>>>
>>>> client -->  [ REQ ] --- [ XREP ]  === (GUID of client available  
>>>> here) ==
>>>>  [
>>>> XREQ ] [ REP ]  your workers here
>>>>
>>>> I'm using it to track timeouts.
>>>>
>>>> Does that help?
>>>
>>> Not really.  We need the ability to track the location of each  
>>> message
>>> in the worker pool.  The GUIDs track which client submitted the
>>> message.
>>>
>>> Brian
>>>
>>>> On Jul 13, 2010, at 3:41 AM, Brian Granger wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> One of the core parts of a system I am building with zeromq is a  
>>>>> task
>>>>> distribution system.  The system has 1 master process and N  
>>>>> workers.
>>>>> I would like to use XREP/XREQ sockets for this, but am running  
>>>>> into
>>>>> two problems:
>>>>>
>>>>> 1.  The load balancing algorithm (round robin) of the XREQ  
>>>>> socket is
>>>>> not efficient for certain types of loads.  I think the recent
>>>>> discussions on other load balancing algorithms could help in this
>>>>> regard.
>>>>>
>>>>> 2.  Fault tolerance.  If a worker goes down, I need to know  
>>>>> about it
>>>>> and be able to requeue any tasks that went down with the  
>>>>> worker.  To
>>>>> monitor the health of workers, we have implemented a heartbeat
>>>>> mechanism that works great.  BUT, with the current interface, I  
>>>>> don't
>>>>> have any way of discovering which tasks were on the worker that  
>>>>> went
>>>>> down.  This is because the routing information (which client  
>>>>> gets a
>>>>> message) is not exposed in the API.
>>>>>
>>>>> Because of this second limitation, we are having to look at crazy
>>>>> things like using a PAIR socket for each worker.  This is quite  
>>>>> a pain
>>>>> because I can't use the 0MQ load balancing and I have to manage  
>>>>> all of
>>>>> those PAIR sockets (1 for each worker).  With lots of workers this
>>>>> doesn't scale very well.
>>>>>
>>>>> Any thought on how to make XREQ/XREP more fault tolerant?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Brian
>>>>>
>>>>> --
>>>>> Brian E. Granger, Ph.D.
>>>>> Assistant Professor of Physics
>>>>> Cal Poly State University, San Luis Obispo
>>>>> bgranger at calpoly.edu
>>>>> ellisonbg at gmail.com
>>>>> _______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org
>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Brian E. Granger, Ph.D.
>>> Assistant Professor of Physics
>>> Cal Poly State University, San Luis Obispo
>>> bgranger at calpoly.edu
>>> ellisonbg at gmail.com
>>
>>
>
>
>
> -- 
> Brian E. Granger, Ph.D.
> Assistant Professor of Physics
> Cal Poly State University, San Luis Obispo
> bgranger at calpoly.edu
> ellisonbg at gmail.com
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev




More information about the zeromq-dev mailing list