[zeromq-dev] thoughts on pub-sub messaging and "reliability"

Pieter Hintjens ph at imatix.com
Sat Oct 18 00:24:59 CEST 2014


Holger,

Here is a whitepaper for a reliable pub-sub product we're now building:

https://github.com/Malamute/malamute-core/blob/master/MALAMUTE.md

If you'd like to be involved in this project (on a commercial basis),
let me know.

-Pieter

On Fri, Oct 17, 2014 at 11:38 AM, Holger Joukl <Holger.Joukl at lbbw.de> wrote:
>
> Hi,
>
> I realize I'm sometimes confused by terms used by myself and others.
>
> So, in an attempt to clarify my thoughts, first and foremost to myself, by
> writing them down here's how I tend to think about pub-sub messaging:
>
> (0) Publish-subscribe (pub-sub) messaging:
> - one or many sender(s) publish data on one or more "channels"
> - one or many listener(s) subscribe to one or more channels
> - communication is asynchronous
> - listeners might not receive all messages even if up-and-running when
> the messages are being sent (due to network glitches or whatever)
>
> (1) Reliable pub-sub messaging:
> - one or many sender(s) publish data on one or more "channels"
> - one or many listener(s) subscribe to one or more channels
> - communication is asynchronous
> - as long as a listener is up-and-running when the messages are being sent
> it will
> receive all messages from all senders publishing on the channels they've
> subscribed to,
> in order (order per sender)
>
> (2) Guaranteed/Certified pub-sub messaging:
> - one or many sender(s) publish data on one or more "channels"
> - one or many listener(s) subscribe to one or more channels
> - communication is asynchronous
> - as long as a listener is up-and-running when the messages are being sent
> it will
> receive all messages from all senders publishing on the channels they've
> subscribed to,
> in order (order per sender)
> - even if a listener is *not* up-and-running when messages are being sent a
> listener
> will still get all the messages, i.e. the missed messages will get
> re-delivered,
> in order (order per sender); the listener will not receive new messages
> until it
> has received all missed message
> - as a consequence, messages need to get persisted:
>   - either each listeners' subscriptions need to get registered when
> opening a
> subscription, or by predefined configuration, so persisted messages can be
> safely
> deleted from persistent storage when all registered listeners have received
> them
> an explicitly acknowledged that fact
>   - or all sent messages need to get persisted for a period of time so a
> listener
> can request missed messages for retransmission
>
> Sometimes the "reliability" distinction is called different "qualities of
> service"
> (QoS).
>
> Note that I've deliberately ignored any problems that might arise in
> pub-sub
> communications, e.g. slow consumers in (unreliable) high frequency
> scenarios
> or the like.
>
> Pub-sub messaging can be implemented over a variety of transports and
> protocols:
> (UDP) broadcast, multicast, TCP, ...
> The transport + protocol used determines the properties of
> pub-sub-messaging, e.g.:
> - plain UDP broadcast is unreliable
> - PGM or NORM multicast is reliable
> - a protocol on top of TCP/UDP/reliable multicast is ususally necessary for
> guaranteed/certified messaging
>
> Design approaches you encounter in the wild:
>
> - central broker with queues, optionally persistent (e.g. AMQP brokers, JMS
> providers, IBM MQ, ...)
>   - senders do not know anything about listeners
>   - listeners do not know anything about senders
>   - the broker is "the rendezvous point" for communication, often called
> "message bus"
>   - senders connect to the broker (usually TCP)
>   - listeners connect to the broker (usually TCP)
>   - channels are usually called "topics" (basically a special case of queue
> that allows
>     for many listeners to receive the topic messages)
>   - broker knows registered guaranteed/certified listeners: When all known
> listeners
>     have retrieved and acknowledged a certain message on a topic this
> message will get
>     deleted from the queue
>   - queue persistence to make queued messages survive broker failure
>   - broker might be distributed, i.e. multiple broker working in
> cooperation, e.g. for reasons
>     of throughput scaling, partitioning of data channels, replication
>   - broker is a single point of failure so will normally get clustered and
> replicated for
>     some notion of cold/hot standby for mission critical communication
>
> - central broker with a persistent commit log or journal (e.g. Apache
> Kafka, ZPER, ...)
>   - senders do not know anything about listeners
>   - listeners do not know anything about senders
>   - the broker is "the rendezvous point" for communication
>   - senders connect to the broker (usually TCP)
>   - listeners connect to the broker (usually TCP)
>   - messages sent on the channels are persisted by the broker as sequenced
> continous message
>     "commit logs" for a configurable period of time
>   - listeners can get retransmission of any historic message within the
> configured period of
>     time
>   - listeners are responsible for their state, i.e. which messages they
> have already
>     processed
>   - listener state is basically just a pointer to its current position in
> the commit log
>    - broker might be distributed, i.e. multiple broker working in
> cooperation, e.g. for reasons
>     of throughput scaling, partitioning of data channels, replication
>   - broker is a single point of failure so will normally get clustered and
> replicated for
>     some notion of cold/hot standby for mission critical communication
>
> - distributed message bus (e.g. TIB/Rendezvous, http://iris.karalabe.com/
> , ...)
>   - consists of a network of "micro brokers", sometimes called daemons or
> agent or relay process,
>     ususally one per host
>   - senders connect to a (local) broker (usually IPC or TCP)
>   - listeners connect to a (local) broker (usually IPC or TCP)
>   - the "network of brokers" is "the rendezvous point" for communication,
> i.e. the brokers
>     collectively form a "message bus"
>   - senders do not know anything about listeners in case of reliable
> pub-sub
>   - listeners do not know anything about senders in case of reliable
> pub-sub
>   - senders know about registered listeners in case of guaranteed/certified
> pub-sub
>   - message persistence is local to the sending clients (called "ledger"
> file e.g. in TIB/Rv
>     terms); might also be possible local to the local broker but I haven't
> seen such an approach
>     yet
>   - registered listeners need to acknowledge messages with their senders so
> they can remove them
>     from persistent storage
>   - brokers are not single points of failure
>   - brokers do not need separate clustering/replication apart from what
> needs to be done
>     for the mission critical applications running on their host systems
> anyway; they are
>     basically guarded by the same safety measures taken for the host
> system, ranging from none
>     to whatever
>
> - somewhat hybrid forms, e.g. brokers that scale out elastically like
> http://roq-messaging.org/
>
> Hope no one's annoyed by the
> lengthy-not-really-a-question-nor-problem-description
> kind of post. Glad to get anyone's thoughts or totally different views or
> hints on the
> glaringly obvious that I missed.
>
> Best regards
> Holger
>
> Landesbank Baden-Wuerttemberg
> Anstalt des oeffentlichen Rechts
> Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz
> HRA 12704
> Amtsgericht Stuttgart
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev



More information about the zeromq-dev mailing list