[zeromq-dev] thoughts on pub-sub messaging and "reliability"

Ben Kloosterman bklooste at gmail.com
Sat Oct 18 10:45:33 CEST 2014


Agree on pub sub as best effort.

100% guaranteed delivery is a myth .. pull the cable out and don't put it
back. QED

In most cases attempts at guaranteed delivery create more bugs and problems
then "non guaranteed" delivery. I have seen several cases where people
introduced persistence which crippled performance so required clustering
and required a lot of extra work to overcome performance limitations which
created even more bugs. When the upstream link failed for 4 hours ( digger
through the cable before DR could come on )  the disk system lost the
guaranteed messages because it ran out of quota disk space - note here bad
blocks , corrupt files or indexes for sql for message storage etc  . The
big issue here was the psychological  "guarantee"  that was part of the
design and the guarantee` is false / conditional  . It should be termed
"more reliable" at best as that makes you think about what you want.

This does not mean you dont send acks and retransmit but that you think
about what you need since  relying on the lower layer to guarantee does not
always work  . eg for systems with unreliable networks I prefer retransmit
at the app layer because sometimes the message / transport layer acks (
like tcp ) but it gets to the message layer but not to the application eg a
shutdown and the you never get the message.

Regards

Ben

On Sat, Oct 18, 2014 at 6:39 AM, Justin Karneges <justin at affinix.com> wrote:

> Hi Holger,
>
> You've got a good list here.
>
> If you're looking for opinions, I'll say that I'm fond of the first
> approach (0), which is to treat pubsub as a best-effort transport. This
> is also in spirit with what is discussed in the ZeroMQ guide:
> http://zguide.zeromq.org/page:all#Pros-and-Cons-of-Pub-Sub
>
> I mostly work on client/server web applications, and I want reliability
> just like anyone else. However, 100% guaranteed delivery in the pubsub
> layer is not necessary to achieve this. Not only that, guaranteed
> delivery is quite difficult to get right and comes with all sorts of
> implications. Instead, I create a reliable request/response mechanism
> for synchronization (of application data, not "messages") and combine it
> with best-effort pubsub messaging.
>
> On Fri, Oct 17, 2014, at 02:38 AM, Holger Joukl wrote:
> >
> > Hi,
> >
> > I realize I'm sometimes confused by terms used by myself and others.
> >
> > So, in an attempt to clarify my thoughts, first and foremost to myself,
> > by
> > writing them down here's how I tend to think about pub-sub messaging:
> >
> > (0) Publish-subscribe (pub-sub) messaging:
> > - one or many sender(s) publish data on one or more "channels"
> > - one or many listener(s) subscribe to one or more channels
> > - communication is asynchronous
> > - listeners might not receive all messages even if up-and-running when
> > the messages are being sent (due to network glitches or whatever)
> >
> > (1) Reliable pub-sub messaging:
> > - one or many sender(s) publish data on one or more "channels"
> > - one or many listener(s) subscribe to one or more channels
> > - communication is asynchronous
> > - as long as a listener is up-and-running when the messages are being
> > sent
> > it will
> > receive all messages from all senders publishing on the channels they've
> > subscribed to,
> > in order (order per sender)
> >
> > (2) Guaranteed/Certified pub-sub messaging:
> > - one or many sender(s) publish data on one or more "channels"
> > - one or many listener(s) subscribe to one or more channels
> > - communication is asynchronous
> > - as long as a listener is up-and-running when the messages are being
> > sent
> > it will
> > receive all messages from all senders publishing on the channels they've
> > subscribed to,
> > in order (order per sender)
> > - even if a listener is *not* up-and-running when messages are being sent
> > a
> > listener
> > will still get all the messages, i.e. the missed messages will get
> > re-delivered,
> > in order (order per sender); the listener will not receive new messages
> > until it
> > has received all missed message
> > - as a consequence, messages need to get persisted:
> >   - either each listeners' subscriptions need to get registered when
> > opening a
> > subscription, or by predefined configuration, so persisted messages can
> > be
> > safely
> > deleted from persistent storage when all registered listeners have
> > received
> > them
> > an explicitly acknowledged that fact
> >   - or all sent messages need to get persisted for a period of time so a
> > listener
> > can request missed messages for retransmission
> >
> > Sometimes the "reliability" distinction is called different "qualities of
> > service"
> > (QoS).
> >
> > Note that I've deliberately ignored any problems that might arise in
> > pub-sub
> > communications, e.g. slow consumers in (unreliable) high frequency
> > scenarios
> > or the like.
> >
> > Pub-sub messaging can be implemented over a variety of transports and
> > protocols:
> > (UDP) broadcast, multicast, TCP, ...
> > The transport + protocol used determines the properties of
> > pub-sub-messaging, e.g.:
> > - plain UDP broadcast is unreliable
> > - PGM or NORM multicast is reliable
> > - a protocol on top of TCP/UDP/reliable multicast is ususally necessary
> > for
> > guaranteed/certified messaging
> >
> > Design approaches you encounter in the wild:
> >
> > - central broker with queues, optionally persistent (e.g. AMQP brokers,
> > JMS
> > providers, IBM MQ, ...)
> >   - senders do not know anything about listeners
> >   - listeners do not know anything about senders
> >   - the broker is "the rendezvous point" for communication, often called
> > "message bus"
> >   - senders connect to the broker (usually TCP)
> >   - listeners connect to the broker (usually TCP)
> >   - channels are usually called "topics" (basically a special case of
> >   queue
> > that allows
> >     for many listeners to receive the topic messages)
> >   - broker knows registered guaranteed/certified listeners: When all
> >   known
> > listeners
> >     have retrieved and acknowledged a certain message on a topic this
> > message will get
> >     deleted from the queue
> >   - queue persistence to make queued messages survive broker failure
> >   - broker might be distributed, i.e. multiple broker working in
> > cooperation, e.g. for reasons
> >     of throughput scaling, partitioning of data channels, replication
> >   - broker is a single point of failure so will normally get clustered
> >   and
> > replicated for
> >     some notion of cold/hot standby for mission critical communication
> >
> > - central broker with a persistent commit log or journal (e.g. Apache
> > Kafka, ZPER, ...)
> >   - senders do not know anything about listeners
> >   - listeners do not know anything about senders
> >   - the broker is "the rendezvous point" for communication
> >   - senders connect to the broker (usually TCP)
> >   - listeners connect to the broker (usually TCP)
> >   - messages sent on the channels are persisted by the broker as
> >   sequenced
> > continous message
> >     "commit logs" for a configurable period of time
> >   - listeners can get retransmission of any historic message within the
> > configured period of
> >     time
> >   - listeners are responsible for their state, i.e. which messages they
> > have already
> >     processed
> >   - listener state is basically just a pointer to its current position in
> > the commit log
> >    - broker might be distributed, i.e. multiple broker working in
> > cooperation, e.g. for reasons
> >     of throughput scaling, partitioning of data channels, replication
> >   - broker is a single point of failure so will normally get clustered
> >   and
> > replicated for
> >     some notion of cold/hot standby for mission critical communication
> >
> > - distributed message bus (e.g. TIB/Rendezvous,
> http://iris.karalabe.com/
> > , ...)
> >   - consists of a network of "micro brokers", sometimes called daemons or
> > agent or relay process,
> >     ususally one per host
> >   - senders connect to a (local) broker (usually IPC or TCP)
> >   - listeners connect to a (local) broker (usually IPC or TCP)
> >   - the "network of brokers" is "the rendezvous point" for communication,
> > i.e. the brokers
> >     collectively form a "message bus"
> >   - senders do not know anything about listeners in case of reliable
> > pub-sub
> >   - listeners do not know anything about senders in case of reliable
> > pub-sub
> >   - senders know about registered listeners in case of
> >   guaranteed/certified
> > pub-sub
> >   - message persistence is local to the sending clients (called "ledger"
> > file e.g. in TIB/Rv
> >     terms); might also be possible local to the local broker but I
> >     haven't
> > seen such an approach
> >     yet
> >   - registered listeners need to acknowledge messages with their senders
> >   so
> > they can remove them
> >     from persistent storage
> >   - brokers are not single points of failure
> >   - brokers do not need separate clustering/replication apart from what
> > needs to be done
> >     for the mission critical applications running on their host systems
> > anyway; they are
> >     basically guarded by the same safety measures taken for the host
> > system, ranging from none
> >     to whatever
> >
> > - somewhat hybrid forms, e.g. brokers that scale out elastically like
> > http://roq-messaging.org/
> >
> > Hope no one's annoyed by the
> > lengthy-not-really-a-question-nor-problem-description
> > kind of post. Glad to get anyone's thoughts or totally different views or
> > hints on the
> > glaringly obvious that I missed.
> >
> > Best regards
> > Holger
> >
> > Landesbank Baden-Wuerttemberg
> > Anstalt des oeffentlichen Rechts
> > Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz
> > HRA 12704
> > Amtsgericht Stuttgart
> >
> > _______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev at lists.zeromq.org
> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20141018/8c6853ac/attachment.htm>


More information about the zeromq-dev mailing list