[zeromq-dev] debugging performance problems

Martin Sustrik sustrik at 250bpm.com
Tue May 17 09:47:51 CEST 2011


Hi all,

Here are some comments on the topic:

Every 0MQ/TCP connection has 4 buffers: 0MQ buffer on sender limited by 
HWM on sender, TCP tx buffer on sender limited by sender's SNDBUF, TCP 
rx buffer on receiver limited by receiver's RCVBUF and finally 0MQ 
message buffer on receiver limited by receiver's HWM setting.

You can think of the whole thing as a series of tubes, each having 
particular capacity. If particular buffer is full, either because it's 
filled in too quickly or because it's emptied too slowly, it applies 
backpressure, ie. it stops accepting new data from upstream. That causes 
upstream buffer to fill in and when it hits its limit to apply 
backpressure further on upstream etc.

One of the consequences of the model is that if there's an unlimited 
buffer somewhere in the chain, the backpressure from other buffers will 
cause all the messages to accumulate there when congestion hits.

That's why I proposed to set default HWM to a finite value recently. 
Even an arbitrary number like 1000 is better than infinite buffer.

The case of PUB/SUB is special in that the buffers, when full start 
dropping messages instead of applying backpressure. The reason is not to 
block the whole distribution tree because of a single slow consumer. 
Anyway, pub/sub is not the problem we are solving here, so this comment 
is a bit off-topic.

As for the monitoring stuff, I assume you have something like 
parallelised pipeline: the messages are passed through several 
processing steps, always being forwarded to the next step (worker app) 
by some central device (broker).

What would really help in such case would be to have HWM set to 
reasonable values everywhere in the topology and let the excess messages 
queue in the devices. A smart device can than do monitoring, ie. 
periodically publish the number of messages it holds or whatever.

I am not sure whether there are such smart devices with monitoring 
around. I dimly recall that pyzmq project may contain something like 
that, but I am not sure. In any case, I believe the smart devices are 
the area where most value-add can be brought and will ultimately become 
a significant part of 0mq ecosystem.

If you have no devices in the topology, the monitoring becomes more 
complex. The easiest way is probably to monitor the applications' memory 
usage. If it grows, the messages are likely queueing there.

The issue was discussed on the 0mq conference recently. The need for 
explicit monitoring of the library was expressed. It's not 100% clear 
how to do it though. The options mentioned were:

1. write the statistics to the syslog
2. publish them in-process using 0MQ sys://log transport
3. publish them to the outside world using 0MQ tcp transport
4. expose the statistics using socket options

Martin



More information about the zeromq-dev mailing list