[zeromq-dev] Java API is not notifed of C++ assert failures.

Martin Sustrik sustrik at fastmq.com
Fri Mar 13 11:45:48 CET 2009


Hi Vladimir,

> The problem that I am trying to solve is to distribute a large amount of
> computation that is inherently parallelizable amongst a (possibly large) set
> of machines. Scalability is the main goal. Fail-over would be a side effect.
> The application would not be geographically distributed and therefore would
> not get dissynchronized.
> The primary machine would be responsible for decision making, and as a
> result, there should always be only one primary. If there's none or more
> than one then there's trouble. Small intervals when there's no primary are
> acceptable.

My feeling here is that you are mixing two different dataflows here. One 
of them is load-balancing multiple services to share the load, the other 
one is high-availability cluster.

Load-balancing is simpler. For more info have a look here:

http://www.zeromq.org/tutorials:butterfly

High-availability is the tricky part. For discussion on how we've 
implemented high-availability in OpenAMQ, have a look here:

http://www.openamq.org/doc:user-3-advanced#toc39

In any case, the point is that these two pretty different architectures 
seem to be mixed in your design.

I'm attaching a diagram how I would implement the thing. On the left, 
there's load balancing mechanism. On the right, there's a 
high-availability cluster. Note that nodes in the HA cluster have to 
speak directly each to another and take any precaution needed to ensure 
that the communication is not broken - including direct (non-switched) 
cables, dedicated NICs, even several dedicated NICs with multihoming 
bahviour etc.

> Once I have a set of machines up they should not require any manual
> intervention. Any new machine that joins the collective would lighten the
> load on the other machines. Any machine that leaves the collective would
> increase the load.

This is achieved via load-balancing.

> The first machine that comes up is the primary. When the
> primary goes down another one assumes the role of primary.

The hard-bit. If you want to do this in dynamic fashion, I would suggest 
having a look at research papers in distributed computing discussing the 
decision making in uncentralised peer-to-peer systems. There's no 
generic 100% safe algorithm to do this, but you will at least able to 
calculate the chance of getting duplicate primaries based on the failure 
probabilities of individual components in the network.

> I believe this
> mechanism is implementable if instead of killing the process that tries to
> create a global object that already exists, the 0mq code would throw an
> exception instead and let the application decide how to handle the error
> condition. Is the decision to assert rather than throw exceptions final?

Yes. The idea is that having two instances of same global service is a 
split-brain situation caused by bad design/configuration. Offending 
component should fail immediately to alert the administrator/developer 
about lethal bug in the program/configuration.

> Another option to implement such a mechanism would be to use the reliable
> multicast features of 0mq - any documentation and ETA on that?

0MQ supports PGM protocol. Just build it with --with-openpgm option and 
specify the right protocol when defining global exchange (see zmq(7) 
manpage).

Still, multicast won't help you in any way to solve the problem of 
choosing a primary.

> 
> Vladimir
> 
> P.S. Here's a high level description of how scalability would be achieved.
> 
> All machines start with an exact copy of the data and the work is carved
> among themselves. Each machine processes its chunk and then multicasts the
> deltas to the other machines. When a machine finishes its work it applies
> the received deltas on the data such that at the end of this step the data
> is again in sync. The primary machine is thus responsible for house keeping:
> when to move to the next step, when to rebalance the work load, etc.

Sure. Understood.

Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ha.png
Type: image/png
Size: 24723 bytes
Desc: not available
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20090313/6d62c3ca/attachment.png>


More information about the zeromq-dev mailing list