[zeromq-dev] Implementing failover in client side

Pieter Hintjens ph at imatix.com
Sat Oct 30 00:48:39 CEST 2010


Sorry, the perils of emailing from a mobile phone. Rather too brief.

We're building a reliability layer, for a client who's atypically
smart and wants it to be open sourced.  This is for reliable RPC, not
pubsub or any other pattern.

I'll explain this in more detail in Ch4 of the Guide when I get around
to that, but the first version used a client, a queue, and a server.
The queue implements the server presence and availability detection
(Iñaki's design with some more sauce), the worker implements a mama
worker with heartbeating, and the client kind of randomly retries
until it gives up.

It works, combining load balancing with failover and recovery.

We then broke the queue into two pieces, part in the client, part in
the server, and implemented that as classes in ZFL, which is the C
class system[1] we're building on top of 0MQ (we being iMatix and
contributors though it's a separate project from 0MQ core).

With no queue, clients and servers interconnect N to N and message
queuing in fact moves to the servers.  It's complex enough to not be
fun to re-implement in each application.  Using classes makes it nice
to use.  The class creates an internal thread and talks to it over
inproc:, the thread then does the real work.

Here's a mockup of client application code:

    //  Create thread and bind to inproc endpoint
    zfl_rpc_client_t *rpc_client;
    rpc_client = zfl_rpc_client_new ("inproc://client");

    //  Connect to three servers
    zfl_rpc_client_connect (rpc_client, "tpc://192.168.0.55:6061");
    zfl_rpc_client_connect (rpc_client, "tpc://192.168.0.56:6061");
    zfl_rpc_client_connect (rpc_client, "tpc://192.168.0.57:6061");

    //  Format request message as zfl_msg object
    zfl_msg_t *request = zfl_msg_new ();
    ...

    //  Send message (destroy after sending) and return reply
    zfl_msg_t *reply = zfl_rpc_client_send (rpc_client, &request);
    if (!reply)
        printf ("No service available\n");

    //  End RPC client thread
    zfl_rpc_client_destroy (&rpc_client);

There's still some work to do, e.g. to add UUIDs to requests and
discard duplicate replies, because if servers reply very slowly,
requests will get sent to other servers.  The retry parameters are
also still hard coded, they'll need to be made configurable.

I'm sure it would be a small thing to reimplement the classes in C++
or other languages.  ZFL is aimed at C development because we find
that language most accessible to 'normal' systems developers, and I
like it personally.  The ZFL class model is a simplified version of
one we developed for OpenAMQ.  It's neat and thin and already familiar
if you've played with the zmsg class from the Guide.

Regards
Pieter

[1] http://github.com/zeromq/zfl.

On Sat, Oct 30, 2010 at 12:13 AM, Pieter Hintjens <ph at imatix.com> wrote:
> Take a look at the ZFL classes zfl_rpc_server and zfl_rpc_client at
> github.com/zeromq/zfl. They are a work in progress by Martin Hurton and
> myself and intended as material for chapter 4 of the guide. But already
> usable imo.
>
> -Pieter
>
> On 29 Oct 2010 21:17, "Jaime Fernández" <jjjaime at gmail.com> wrote:
>> This message is related with the thread:
>> http://lists.zeromq.org/pipermail/zeromq-dev/2010-October/006799.html
>> but I'm afraid that I cannot reply to that message.
>>
>> I found the same problem when trying to use REQ/REP with several servers,
>> but one of the server crashed. I think that the analysis proposed by Iñaki
>> (in http://lists.zeromq.org/pipermail/zeromq-dev/2010-October/006799.html)
>> is exactly the behaviour that I would expect. Or at least some mechanims
>> to
>> set up a timeout to a socket so that I can remove the non-responding
>> server
>> from the list.
>>
>> Pieter has proposed a very interesting design based on an intermediary (a
>> queue). However, from my point of view, this may be sometimes a drawback
>> because the queue is a unique point of failure. Now if the queue is
>> anavailable, the whole system is down.
>>
>> I've seen that Mongrel2 (
>> http://mongrel2.org/doc/tip/docs/manual/book.wiki#x1-640005.2) is using
>> PUSH/PULL + PUB/SUB. However, it makes the system more difficult.
>>
>> However, if zmq supports the failover mechanism natively, REQ/REP pattern
>> would be much more powerful (the deadlock is unacceptable in production
>> systems). Probably, REQ/REP would satisfy a great part of the projects.
>



-- 
-
Pieter Hintjens
iMatix - www.imatix.com



More information about the zeromq-dev mailing list