[zeromq-dev] HWM behavior on PUB-SUB sockets and improvement proposal

Juan López lopez at ac.upc.edu
Sat Jul 6 06:11:15 CEST 2013


Hi,

I have implemented a publish-subscribe middleware using ZMQ as underlying
layer and when I have done some performance tests I get very bad and
unexpected results based on our initial trials with ZMQ.

More specifically I get very good results when I use low message size but
as I increase it, the latency grows extremely fast. I profiled the library
and I didn’t notice any special, so trying to isolate the problem I decided
to implement a simple XPUB-SUB performance example similar to the ones on
ZMQ distribution for REQ-REP (perhaps it can be interesting to add it to
the perf directory…) to have a baseline.

I noticed that when HWM is set to 1000 (by default), it starts dropping
messages when I start increasing the message size. As this system requires
reliable publication of messages, I have set the HWM to 0 to allow ZMQ to
buffer the outgoing messages, and… I get the same behavior that with my
full system. Very low latency at small message sizes but extremely large
latencies as I increase the size.

Basically the publisher is sending messages as fast as it can, and ZMQ is
increasing constantly their queues, using all the RAM in the computer and
it starts to swap in and out crazily.

So, as far as I understand, the only flow control mechanisms that ZMQ
provides for PUB-SUB is setting HWM to a reasonable value and accept that
ZMQ will drop messages if you the publisher goes to fast, or set HWM to 0
and implement this flow control on the user app.

In some pub-sub systems you can assume that some messages are dropped,
because the topic will be resent in the near future but this is not always
the case. On the other side, assuming that the user app has to implement
control flow by its own without any help is a big restriction for ZMQ. In
fact, ZMQ provides this control flow for all the other socket types by
simply blocking the calls if HWM is reached.

So…

Am I terribly wrong and I and I missing something? Is there other design
possible? I have seen on the group some past posts on this topic suggesting
changing from PUB-SUB sockets to PUSH-PULL. But in my case I need the fan
out messages distribution and also the topic filtering…  so I think it does
not apply.

I have been looking through the libzmq code, and I think that this case can
be solved with some modifications to the PUB-SUB sockets:

1 – Add a ZMQ_PUB_NO_DROP or ZMQ_PUB_FLOW_CONTROL socket option, that
allows to keep the current dropping behavior or implement a blocking one.

2- If this option is enabled, when a PUB socket is sending a message (in
dist.cpp) and a queue reaches HWM a flag is signaled in the socket so the
next send call will block.

3 – Meanwhile the I/O threads will be emptying the queues as the
subscribers receive the data from the queues, just as it is working right
now. Obviously, we will get the throughput that the slowest subscriber
allows, but if we don’t want to drop messages this is the way.

4 – When the queue recovers from the HWM, we clear the flag and the sending
calls on this socket can continue.

5- If a subscriber socket is disconnected, it is not taken into account for
blocking. This proposal is only a control flow mechanism (similar to the
ZMQ_RATE option for multicast sockets), nodes (dis)connection and network
status (if needed) are on the user side.


What do you think? Can this work? Will it be generic enough to be added to
the base code?

     Juan
-- 
---------------------------------------------------------------------------
Juan Lopez Rubio                                                      U P C
Technical University of Catalonia            E-mail: lopez at ac.upc.es  o o o
Escola Politécnica Superior de Castelldefels Phone : +34-93-4137105   o o o
Computer Architecture Department             Fax   : +34-93-4137007   o o o
C\Esteve Terrades, 15 despatx 010            WWW   : http://www.ac.upc.es
08860 - Castelldefels, SPAIN                         http://icarus.upc.es
---------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130706/0e5635d2/attachment.htm>


More information about the zeromq-dev mailing list