[zeromq-dev] Java API is not notifed of C++ assert failures.
Pieter Hintjens
ph at imatix.com
Thu Mar 12 18:45:55 CET 2009
On Thu, Mar 12, 2009 at 6:12 PM, Martin Sustrik <sustrik at fastmq.com> wrote:
> In short, automatic selection of primary is a complex problem studied
> extensively by distributed algorithm scientists and as far as I am aware
> it has no generic solution. My advice would be to select primary by hand
> if at all possible.
For what it's worth, we spent several years refining and simplifying
the failover in OpenAMQ and part of this was determining which process
was primary, and the rules for failover.
What we concluded was that there are two viable architectures. First,
with N nodes where none is primary, and any can be removed or added.
This requires many interconnections, and some manner of discovery, but
is the most robust and scalable design. Second, one primary node
which automatically fails over to a secondary, both being defined
explicitly so that applications know the difference. Vital in this
scheme is that it is not symmetric, and that recovery is done by
external decision (by stopping the secondary when the primary is alive
again).
If you need an algorithm for the second architecture, Martin can
provide it, he wrote the high-availability engine in OpenAMQ.
-Pieter
More information about the zeromq-dev
mailing list