[zeromq-dev] Reliability question

Martin Sustrik sustrik at 250bpm.com
Mon Aug 2 10:28:03 CEST 2010


Steven McCoy wrote:
> On 2 August 2010 15:55, Martin Sustrik <sustrik at 250bpm.com 
> <mailto:sustrik at 250bpm.com>> wrote:
> 
>     Sure. The reason is that PGM is not fully reliable -- hung-up/slow
>     consumer will be disconnected ultimately and thus start loosing
>     messages.
> 
>     If you think of reliability as "never loose messages" you end up with
>     global standstill as a result of slow/hung-up consumer.
> 
> 
> LBM has the nice solution to this, you disconnect the upstream of the 
> crybaby client and attach to a slow rate limited historical replay. 
>  This and the high speed persistence are great features on paper but the 
> details seem overly arduous.

Exactly.

The LBM solution makes following assumption: We have enough resources 
(disk space) to store the historical feed till the crybaby consumer is 
fixed/replaced/killed by the datacenter staff.

When applying it to Internet there are two problems:

1. Slow/hung-up consumers are out of publisher's control. They may never 
be fixed. They can literally sit there and cause problems for years.

2. The resource (memory/disk space) allocated to your communication at a 
middle box (think of an Internet backbone) is going to be severely 
limited. The worst-case assumption should be that the consumer can stop 
consuming only for a fraction of a second, otherwise the buffers at the 
middle nodes start overflowing.

Martin



More information about the zeromq-dev mailing list