[zeromq-dev] thoughts on pub-sub messaging and "reliability"
Holger Joukl
Holger.Joukl at LBBW.de
Fri Oct 17 11:38:26 CEST 2014
Hi,
I realize I'm sometimes confused by terms used by myself and others.
So, in an attempt to clarify my thoughts, first and foremost to myself, by
writing them down here's how I tend to think about pub-sub messaging:
(0) Publish-subscribe (pub-sub) messaging:
- one or many sender(s) publish data on one or more "channels"
- one or many listener(s) subscribe to one or more channels
- communication is asynchronous
- listeners might not receive all messages even if up-and-running when
the messages are being sent (due to network glitches or whatever)
(1) Reliable pub-sub messaging:
- one or many sender(s) publish data on one or more "channels"
- one or many listener(s) subscribe to one or more channels
- communication is asynchronous
- as long as a listener is up-and-running when the messages are being sent
it will
receive all messages from all senders publishing on the channels they've
subscribed to,
in order (order per sender)
(2) Guaranteed/Certified pub-sub messaging:
- one or many sender(s) publish data on one or more "channels"
- one or many listener(s) subscribe to one or more channels
- communication is asynchronous
- as long as a listener is up-and-running when the messages are being sent
it will
receive all messages from all senders publishing on the channels they've
subscribed to,
in order (order per sender)
- even if a listener is *not* up-and-running when messages are being sent a
listener
will still get all the messages, i.e. the missed messages will get
re-delivered,
in order (order per sender); the listener will not receive new messages
until it
has received all missed message
- as a consequence, messages need to get persisted:
- either each listeners' subscriptions need to get registered when
opening a
subscription, or by predefined configuration, so persisted messages can be
safely
deleted from persistent storage when all registered listeners have received
them
an explicitly acknowledged that fact
- or all sent messages need to get persisted for a period of time so a
listener
can request missed messages for retransmission
Sometimes the "reliability" distinction is called different "qualities of
service"
(QoS).
Note that I've deliberately ignored any problems that might arise in
pub-sub
communications, e.g. slow consumers in (unreliable) high frequency
scenarios
or the like.
Pub-sub messaging can be implemented over a variety of transports and
protocols:
(UDP) broadcast, multicast, TCP, ...
The transport + protocol used determines the properties of
pub-sub-messaging, e.g.:
- plain UDP broadcast is unreliable
- PGM or NORM multicast is reliable
- a protocol on top of TCP/UDP/reliable multicast is ususally necessary for
guaranteed/certified messaging
Design approaches you encounter in the wild:
- central broker with queues, optionally persistent (e.g. AMQP brokers, JMS
providers, IBM MQ, ...)
- senders do not know anything about listeners
- listeners do not know anything about senders
- the broker is "the rendezvous point" for communication, often called
"message bus"
- senders connect to the broker (usually TCP)
- listeners connect to the broker (usually TCP)
- channels are usually called "topics" (basically a special case of queue
that allows
for many listeners to receive the topic messages)
- broker knows registered guaranteed/certified listeners: When all known
listeners
have retrieved and acknowledged a certain message on a topic this
message will get
deleted from the queue
- queue persistence to make queued messages survive broker failure
- broker might be distributed, i.e. multiple broker working in
cooperation, e.g. for reasons
of throughput scaling, partitioning of data channels, replication
- broker is a single point of failure so will normally get clustered and
replicated for
some notion of cold/hot standby for mission critical communication
- central broker with a persistent commit log or journal (e.g. Apache
Kafka, ZPER, ...)
- senders do not know anything about listeners
- listeners do not know anything about senders
- the broker is "the rendezvous point" for communication
- senders connect to the broker (usually TCP)
- listeners connect to the broker (usually TCP)
- messages sent on the channels are persisted by the broker as sequenced
continous message
"commit logs" for a configurable period of time
- listeners can get retransmission of any historic message within the
configured period of
time
- listeners are responsible for their state, i.e. which messages they
have already
processed
- listener state is basically just a pointer to its current position in
the commit log
- broker might be distributed, i.e. multiple broker working in
cooperation, e.g. for reasons
of throughput scaling, partitioning of data channels, replication
- broker is a single point of failure so will normally get clustered and
replicated for
some notion of cold/hot standby for mission critical communication
- distributed message bus (e.g. TIB/Rendezvous, http://iris.karalabe.com/
, ...)
- consists of a network of "micro brokers", sometimes called daemons or
agent or relay process,
ususally one per host
- senders connect to a (local) broker (usually IPC or TCP)
- listeners connect to a (local) broker (usually IPC or TCP)
- the "network of brokers" is "the rendezvous point" for communication,
i.e. the brokers
collectively form a "message bus"
- senders do not know anything about listeners in case of reliable
pub-sub
- listeners do not know anything about senders in case of reliable
pub-sub
- senders know about registered listeners in case of guaranteed/certified
pub-sub
- message persistence is local to the sending clients (called "ledger"
file e.g. in TIB/Rv
terms); might also be possible local to the local broker but I haven't
seen such an approach
yet
- registered listeners need to acknowledge messages with their senders so
they can remove them
from persistent storage
- brokers are not single points of failure
- brokers do not need separate clustering/replication apart from what
needs to be done
for the mission critical applications running on their host systems
anyway; they are
basically guarded by the same safety measures taken for the host
system, ranging from none
to whatever
- somewhat hybrid forms, e.g. brokers that scale out elastically like
http://roq-messaging.org/
Hope no one's annoyed by the
lengthy-not-really-a-question-nor-problem-description
kind of post. Glad to get anyone's thoughts or totally different views or
hints on the
glaringly obvious that I missed.
Best regards
Holger
Landesbank Baden-Wuerttemberg
Anstalt des oeffentlichen Rechts
Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz
HRA 12704
Amtsgericht Stuttgart
More information about the zeromq-dev
mailing list