[zeromq-dev] EPGM unrecoverable loss detection

Pieter Hintjens ph at imatix.com
Sun Jan 31 15:20:32 CET 2016


In general you should test solve each failure case that you care
about. You simulate, detect, and then recover.

You've got two possible failures here. One is server problems (e.g.
crashed, blocked, thrashing.) You can simulate that easily by adding
long sleeps to your publisher. The way to detect is to add heartbeats,
which all clients subscribe to. When heartbeats stop, you know the
server is having trouble. A monitor process can use this to switch to
another server.

Second problem is network congestion. This is the reason you would get
irrecoverable message loss. You cannot detect lost messages in normal
pub/sub cases. However you can do things like add timestamps to
messages and raise a red flag if the latency spikes. You can't solve
network congestion by switching to a different server. It needs
external intervention.

-Pieter



On Fri, Jan 29, 2016 at 4:00 AM, Simon Wollwage <mail.wollwage at gmail.com> wrote:
> Hi,
>
> as the title already says: is there a way to detect unrecoverable loss when
> using epgm transports in zmq? We need that detection to switch over to a
> standby server.
>
> Any hints/tips appreciated
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>



More information about the zeromq-dev mailing list