[zeromq-dev] ZeroMQ (2.1.7) subscriber "memory leak"
jess.morecroft at gmail.com
Tue Feb 21 17:11:50 CET 2012
Re the bounds, point taken. We've been taking a process level approach
to bound monitoring, action triggering (eg. Monit, nagios), which at
least in this case has allowed us to identify the issue here sooner.
If messages were silently falling off the back of a queue in certain
scenarios we may well have not spotted this till much later. That
said, when we get to production - almost there! -I'll probably take
the advice and set some large hwms.
Re the responsibility of discarding messages, I tend to disagree.
Unless i'm mistaken we're basically talking about a situation where I
can create a sub socket, connect to a bunch of publishers, not
subscribe to a single topic, but be assured a memory explosion if I do
not at some point call recv or poll (which I have no incentive to do
as I'm not subscribed to anything). This behaviour to me at least was
not obvious - I assumed that sub message filtering would like hwm
enforcement just happen behind the scenes and not need to be prompted
by a call to recv or poll.
Anyway, appreciate the constructive feedback.
On 21/02/2012, at 5:59 PM, Jess Morecroft <jess.morecroft at gmail.com> wrote:
> We've been happily using ZeroMQ for a good few months now and are generally impressed with it's robustness and performance. We are however having a problem with one of our servers "leaking" significant amounts of memory over time and after a good few hours trawling the 0mq code I think I might know why.
> The "leaking" server acts as a TCP to ZeroMq proxy for (Internet) clients hooking in to our server applications. It is the only server in our environment that creates and binds to ZeroMQ (sub) sockets, but does not necessarily zmq_poll on them. If there are no client subscriptions mapping to a particular sub socket, we do not add the 0mq socket to the zmq_pollitem_t array when next calling zmq_poll. Such time as at least one client subscription maps to the sub socket, we rebuild the zmq_pollitem_t array including the socket. The first case is where the problem occurs - even though we're not subscribed to a single topic on a socket, 0mq appears to buffer incoming messages indefinitely. It is only when the socket is next included in a call to zmq_poll that 0mq will actually purge all buffered non-matching (ie. all) messages from the buffer.
> In other words, 0mq appears to implement client-side filtering and discarding of unwanted messages within the call to zmq_poll or zmq_recv in the application thread, not in the 0mq I/O thread(s) as I expected. The ironic side-effect of this, at least with our use of "optimised" calls to zmq_poll, is that our proxy server leaks most when no clients are connected!
> My problem now is how to correct this. I see a few options:
> 1. Always pass all sockets to zmq_poll
> 2. Set high water-marks
> 3. Use ZeroMq 3.1, which filters publisher-side and therefore eliminating the scenario completely
> None of these solutions are ideal. 1. involves a delicate rewrite of some core messaging code, 2. opens up the potential for lost/discarded messages, and 3. is not an option currently given 3.1's current beta status. I'm interested as to whether anyone out there has experienced this kind of problem before and has any alternative solutions for tackling things. My suspicion is that a temporary solution of high water-marks, replaced by a proper solution of using 0mq 3.1 once stable, is probably the way forward.
More information about the zeromq-dev