[zeromq-dev] fault-tolerance and butterfly example

Martin Sustrik sustrik at fastmq.com
Fri Mar 20 10:36:05 CET 2009


> First, sorry for my bag english, it is not my native language :-)

My English is not perfect either :)

> I plan to develop fault-tolerant distributed computing cluster with many 
> running applications and many servers.
> 0MQ has many big performance/resource usage/scalability advantages, i 
> want to disscuss about fault-tolerance techniques that may be
> used with 0MQ.
> Now fault-tolerance in 0MQ means, that applications try to reconnect to 
> other applications using same address and port.
> Now take look at butterfly example.
> Fault some(not all) instances of component1 and component2 doesn't stop 
> whole system, but there are some SPOF (Single point
> of failure).Take closer look to 'intermediate' part of butterfly 
> example. When server with this application goes down, whole system stops.
> How we can avoid this?
> my idea:
> 1) We create two copies of intermediate, first(intermediate1) creates 
> exchange INTERMEDIATE_IN1 and queue INTERMEDIATE_OUT1, second 
> (intermediate2)- exchange INTERMEDIATE_IN2 and queue INTERMEDIATE_OUT2.
> 2) Component1 binds their load-balancing local exchange to 
> 3) Component2 binds their local queue to INTERMEDIATE_OUT1 and 

Yes, this will work. However, it can be used only if strict ordering was 
not required to be introduced in the intermediate component. There are 
some applications that do have this requirement. For example, sequencer 
that assigns sequence numbers to messages won't work in this way.

> When both intermediate1 and intermediate2 are up, they working in 
> load-balancing maner without any problem.
> Now what happening, when i intermediate2 goes down? I think, next 
> scenario will take place:
> 1) Component1 has one pipe for INTERMEDIATE_IN1 and one for 
> INTERMEDIATE_IN2. After crashing of intermediate2 it's pipe collects 
> messages in internal buffer. when there are no free place in buffer, 
> load-balanced exchange sends messages only in pipe of intermediate2. 
> System doesn't stop, component1 regularly tries to connect to intermediate2.
> 2) Component2 has one pipe for INTERMEDIATE_OUT1 and one for 
> INTERMEDIATE_OUT2. After crashing of intermediate2, no messages is 
> receiving from intermediate2, only from intermediate1.  System doesn't 
> stop, component2 regularly tries to connect to intermediate2.
> When server with intermediate2 will be restored, messages from 
> component1 pipe will be flushed and system will fully restored.

True. Moreover, there's a pipe limit for exchanges. Once the limit is 
reached (because connection is broken and thus messages cannot be sent 
further) load-balancer ceases to send messages to that particular 
connection, rather it sends them to other connections available at the 

> But what we can do in case, when we can't restore work of intermediate2 
> on old server (we must repair hardware, this will take a lot of time), 
> but can start intermediate2 on another server (different IP,port)? As i 
> understood, it will be successfully started, and zmq_server will store 
> new pointer to global objects of intermediate2. Newly started 
> applications will see correct location of intermediate2. Already running
> applications must be restarted in any case (because application doesn't 
> ask zmq_server for new location of global object).


> restarting of all running applications in big system may be a problem.
> Is there a possibility to clean shutdown application (disconnect from 
> message sources, process all messages in internal buffers, send
> messages-responses to destinations, terminate application)?
> Without possibility of clean shutdown many messages may be lost when 
> restarting all running applications. Not only messages, stored in 
> internal buffers of component1 instances for intermediate2. In really 
> big system this may be very big problem.

What you have to do is to send a shutdown signal to the application. The 
application in its turn should stop posting new messages and wait a 
while before exiting.

Having a "wait while all the messages are sent" option would be a 
problem as it will cause a deadlock if the connections are broken.

So, think of the sleep before terminating the application as of kind of 
timeout. Either all the messages are flushed within this interval or the 
application forces shutdown anyway.

To compute the appropriate interval to wait you should consider the 
maximal number of messages cached by exchange (defined in config.hpp: 
bp_hwm = 10000) their size and the bandwidth of the network connection. 
That should give an estimate of how long to wait to be reasonably sure 
that the messages were sent.

> Now, my questions:
> 1) Will work my idea with two intermediate applications?

Yes. With the sequencing caveat mentioned above.

> 2) How i can clean shutdown application (without losing messages in 
> internal buffers)?

Stop sending messages and wait. As described above.

> 3) How i can force application to read new location of global object 
> from zmq_server, when connection to this object is lost?

Either restart or reinstantiate whole 0MQ infrastructure to get rid of 
internal location caches.


More information about the zeromq-dev mailing list