[zeromq-dev] Implementing failover in client side

Jaime Fernández jjjaime at gmail.com
Sat Oct 30 12:22:44 CEST 2010

Hi Pieter,

Thank you very much for your detailed reply (and also for the nice zeromq

I'm using the python binding, so ZFL classes might be hard to be used.
However I will have a look at them.

Would it be possible to specify a timeout in zmq (to avoid an eternal
blocked socket)? For example, with the function zmq_setsockopt you could
specify the amount of seconds your socket will be waiting for
zmq_send/zmq_recv. It could be an intermediate approach.


On Sat, Oct 30, 2010 at 12:48 AM, Pieter Hintjens <ph at imatix.com> wrote:

> Sorry, the perils of emailing from a mobile phone. Rather too brief.
> We're building a reliability layer, for a client who's atypically
> smart and wants it to be open sourced.  This is for reliable RPC, not
> pubsub or any other pattern.
> I'll explain this in more detail in Ch4 of the Guide when I get around
> to that, but the first version used a client, a queue, and a server.
> The queue implements the server presence and availability detection
> (Iñaki's design with some more sauce), the worker implements a mama
> worker with heartbeating, and the client kind of randomly retries
> until it gives up.
> It works, combining load balancing with failover and recovery.
> We then broke the queue into two pieces, part in the client, part in
> the server, and implemented that as classes in ZFL, which is the C
> class system[1] we're building on top of 0MQ (we being iMatix and
> contributors though it's a separate project from 0MQ core).
> With no queue, clients and servers interconnect N to N and message
> queuing in fact moves to the servers.  It's complex enough to not be
> fun to re-implement in each application.  Using classes makes it nice
> to use.  The class creates an internal thread and talks to it over
> inproc:, the thread then does the real work.
> Here's a mockup of client application code:
>    //  Create thread and bind to inproc endpoint
>    zfl_rpc_client_t *rpc_client;
>    rpc_client = zfl_rpc_client_new ("inproc://client");
>    //  Connect to three servers
>    zfl_rpc_client_connect (rpc_client, "tpc://");
>    zfl_rpc_client_connect (rpc_client, "tpc://");
>    zfl_rpc_client_connect (rpc_client, "tpc://");
>    //  Format request message as zfl_msg object
>    zfl_msg_t *request = zfl_msg_new ();
>    ...
>    //  Send message (destroy after sending) and return reply
>    zfl_msg_t *reply = zfl_rpc_client_send (rpc_client, &request);
>    if (!reply)
>        printf ("No service available\n");
>    //  End RPC client thread
>    zfl_rpc_client_destroy (&rpc_client);
> There's still some work to do, e.g. to add UUIDs to requests and
> discard duplicate replies, because if servers reply very slowly,
> requests will get sent to other servers.  The retry parameters are
> also still hard coded, they'll need to be made configurable.
> I'm sure it would be a small thing to reimplement the classes in C++
> or other languages.  ZFL is aimed at C development because we find
> that language most accessible to 'normal' systems developers, and I
> like it personally.  The ZFL class model is a simplified version of
> one we developed for OpenAMQ.  It's neat and thin and already familiar
> if you've played with the zmsg class from the Guide.
> Regards
> Pieter
> [1] http://github.com/zeromq/zfl.
> On Sat, Oct 30, 2010 at 12:13 AM, Pieter Hintjens <ph at imatix.com> wrote:
> > Take a look at the ZFL classes zfl_rpc_server and zfl_rpc_client at
> > github.com/zeromq/zfl. They are a work in progress by Martin Hurton and
> > myself and intended as material for chapter 4 of the guide. But already
> > usable imo.
> >
> > -Pieter
> >
> > On 29 Oct 2010 21:17, "Jaime Fernández" <jjjaime at gmail.com> wrote:
> >> This message is related with the thread:
> >> http://lists.zeromq.org/pipermail/zeromq-dev/2010-October/006799.html
> >> but I'm afraid that I cannot reply to that message.
> >>
> >> I found the same problem when trying to use REQ/REP with several
> servers,
> >> but one of the server crashed. I think that the analysis proposed by
> Iñaki
> >> (in
> http://lists.zeromq.org/pipermail/zeromq-dev/2010-October/006799.html)
> >> is exactly the behaviour that I would expect. Or at least some mechanims
> >> to
> >> set up a timeout to a socket so that I can remove the non-responding
> >> server
> >> from the list.
> >>
> >> Pieter has proposed a very interesting design based on an intermediary
> (a
> >> queue). However, from my point of view, this may be sometimes a drawback
> >> because the queue is a unique point of failure. Now if the queue is
> >> anavailable, the whole system is down.
> >>
> >> I've seen that Mongrel2 (
> >> http://mongrel2.org/doc/tip/docs/manual/book.wiki#x1-640005.2) is using
> >> PUSH/PULL + PUB/SUB. However, it makes the system more difficult.
> >>
> >> However, if zmq supports the failover mechanism natively, REQ/REP
> pattern
> >> would be much more powerful (the deadlock is unacceptable in production
> >> systems). Probably, REQ/REP would satisfy a great part of the projects.
> >
> --
> -
> Pieter Hintjens
> iMatix - www.imatix.com
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20101030/2d2aec6f/attachment.htm>

More information about the zeromq-dev mailing list