[zeromq-dev] Reliability question
Martin Sustrik
sustrik at 250bpm.com
Mon Aug 2 10:28:03 CEST 2010
Steven McCoy wrote:
> On 2 August 2010 15:55, Martin Sustrik <sustrik at 250bpm.com
> <mailto:sustrik at 250bpm.com>> wrote:
>
> Sure. The reason is that PGM is not fully reliable -- hung-up/slow
> consumer will be disconnected ultimately and thus start loosing
> messages.
>
> If you think of reliability as "never loose messages" you end up with
> global standstill as a result of slow/hung-up consumer.
>
>
> LBM has the nice solution to this, you disconnect the upstream of the
> crybaby client and attach to a slow rate limited historical replay.
> This and the high speed persistence are great features on paper but the
> details seem overly arduous.
Exactly.
The LBM solution makes following assumption: We have enough resources
(disk space) to store the historical feed till the crybaby consumer is
fixed/replaced/killed by the datacenter staff.
When applying it to Internet there are two problems:
1. Slow/hung-up consumers are out of publisher's control. They may never
be fixed. They can literally sit there and cause problems for years.
2. The resource (memory/disk space) allocated to your communication at a
middle box (think of an Internet backbone) is going to be severely
limited. The worst-case assumption should be that the consumer can stop
consuming only for a fraction of a second, otherwise the buffers at the
middle nodes start overflowing.
Martin
More information about the zeromq-dev
mailing list