[zeromq-dev] debugging performance problems
sustrik at 250bpm.com
Tue May 17 09:47:51 CEST 2011
Here are some comments on the topic:
Every 0MQ/TCP connection has 4 buffers: 0MQ buffer on sender limited by
HWM on sender, TCP tx buffer on sender limited by sender's SNDBUF, TCP
rx buffer on receiver limited by receiver's RCVBUF and finally 0MQ
message buffer on receiver limited by receiver's HWM setting.
You can think of the whole thing as a series of tubes, each having
particular capacity. If particular buffer is full, either because it's
filled in too quickly or because it's emptied too slowly, it applies
backpressure, ie. it stops accepting new data from upstream. That causes
upstream buffer to fill in and when it hits its limit to apply
backpressure further on upstream etc.
One of the consequences of the model is that if there's an unlimited
buffer somewhere in the chain, the backpressure from other buffers will
cause all the messages to accumulate there when congestion hits.
That's why I proposed to set default HWM to a finite value recently.
Even an arbitrary number like 1000 is better than infinite buffer.
The case of PUB/SUB is special in that the buffers, when full start
dropping messages instead of applying backpressure. The reason is not to
block the whole distribution tree because of a single slow consumer.
Anyway, pub/sub is not the problem we are solving here, so this comment
is a bit off-topic.
As for the monitoring stuff, I assume you have something like
parallelised pipeline: the messages are passed through several
processing steps, always being forwarded to the next step (worker app)
by some central device (broker).
What would really help in such case would be to have HWM set to
reasonable values everywhere in the topology and let the excess messages
queue in the devices. A smart device can than do monitoring, ie.
periodically publish the number of messages it holds or whatever.
I am not sure whether there are such smart devices with monitoring
around. I dimly recall that pyzmq project may contain something like
that, but I am not sure. In any case, I believe the smart devices are
the area where most value-add can be brought and will ultimately become
a significant part of 0mq ecosystem.
If you have no devices in the topology, the monitoring becomes more
complex. The easiest way is probably to monitor the applications' memory
usage. If it grows, the messages are likely queueing there.
The issue was discussed on the 0mq conference recently. The need for
explicit monitoring of the library was expressed. It's not 100% clear
how to do it though. The options mentioned were:
1. write the statistics to the syslog
2. publish them in-process using 0MQ sys://log transport
3. publish them to the outside world using 0MQ tcp transport
4. expose the statistics using socket options
More information about the zeromq-dev