[zeromq-dev] PUB/SUB unreliabiliity

Gerry Steele gerry.steele at gmail.com
Fri Jun 13 20:33:09 CEST 2014


I've read everything I can find including the Printed book, but I am at a
loss as to the definitive definition as to how PUB/SUB should behave in zmq.

A production system I'm using is experiencing message loss between several
nodes using PUB/SUB.

>From what i've read, PUB SUB should be reliable when the _HWM are set to
zero (don't drop). By reliable I mean no messages should fail to be
delivered to an already connected consumer.

I implemented some utilities to reproduce the message loss in my system :

zmq_sub: https://gist.github.com/easytiger/992b3a29eb5c8545d289
zmq_pub: https://gist.github.com/easytiger/e382502badab49856357


zmq_pub takes a number of events to send and the logging frequency and
zmq_sub only takes the logging frequency. zmq prints out the number of msgs
received vs the packet contents containing the integer packet count from
the publisher.

It can be seen when sending events in a tight loop that messages simply go
missing mid way through (loss is not at beginning or end ruling out slow
connectors etc)

In a small loop it usually works ok:

$ ./zmq_pub 2000 1000
sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1000 with rc=58
sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000 with rc=58

$ ./zmq_sub 1

RECV:1|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1
RECV:2|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2
RECV:3|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #3
RECV:4|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #4
[...]
RECV:2000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000

You can see every message was sent as the counts align.

However increase the message counts and messages start going missing

$ ./zmq_pub 200000 100000

sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #100000 with rc=60
sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #200000 with rc=60

./zmq_sub 10000
RECV:10000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #11000
RECV:20000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #21000
RECV:30000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #31610
RECV:40000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #42000
RECV:50000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #52524
RECV:60000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #64654
RECV:70000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #77298
RECV:80000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #90117
RECV:90000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #102864
RECV:100000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #115846
RECV:110000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #129135
RECV:120000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #141606
RECV:130000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #154179
RECV:140000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #166627
RECV:150000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #179166
RECV:160000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #192247


Is this expected behaviour? With PUSH/PULL I get no loss at all with
similar utilities.

If I put more work between sends (e.g. cout  each time) and the full
message the results are better.

zmq_push: https://gist.github.com/easytiger/2c4f806594ccfbc74f54
zmq_pull:   https://gist.github.com/easytiger/268a630fd22f959fde93

Is there an issue/bug in my implementation that would cause this?

Using zeromq 4.0.3

Many Thanks
Gerry





-- 
Gerry Steele
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20140613/84d0d146/attachment.htm>


More information about the zeromq-dev mailing list