[zeromq-dev] Issues with large numbers of clients

Will Moss wmoss at bu.mp
Wed Jan 23 20:01:36 CET 2013


Hi Pieter,

I can certainly make a test case for the socket leakage, but as I said
before, I thought this was a known issue, so I'm a little confused.
Specifically, I was referring to the line in the
guide<http://zguide.zeromq.org/page:all#Shrugging-It-Off>where it says
"When we use a ROUTER socket in an application that tracks
peers, as peers disconnect and reconnect, the application will leak memory
(resources that the application holds for each peer) and get slower and
slower." How is it that we can avoid this kind of behaviour? I know if the
outside client sends a zmq_close it will actually close the socket, but on
the ROUTER socket that these sockets are connected to, I can't find any
exposed way to inform it that a client, that we have determined is not
returning via heartbeat (or any other mechanism), should be removed from
the (os-level) socket table associated with the router.

I doubt I'm going to be able to come up with a test case that can get a
socket into EPOLLERR or EPOLLHUP, but I think it's reasonably easy to see
how this can happen from the code. If the socket is in one of those states,
then epoll_wait return EPOLLERR or EPOLLHUP and we end up on this
line<https://github.com/zeromq/zeromq2-x/blob/master/src/epoll.cpp#L153>.
This calls in_event in zmq_engine.cpp. We end up on this
line<https://github.com/zeromq/zeromq2-x/blob/master/src/zmq_engine.cpp#L164>,
because disconnected will be set to true which calls back into epoll.cpp
here <https://github.com/zeromq/zeromq2-x/blob/master/src/epoll.cpp#L97>.
This function *only* unregisters the socket for EPOLLIN, so the next time
we call epoll_wait, the socket will still be in the EPOLLERR or EPOLLHUP
state, and we do exactly the same thing again.

Thanks,
Will




On Wed, Jan 23, 2013 at 6:27 AM, Pieter Hintjens <ph at imatix.com> wrote:

> Hi Will,
>
> Can you make a minimal test case that shows the socket leakage? That's
> a first step to solving the problem.
>
> > 2. There appears to be a bug with ZeroMQ's epoll implementation when a
> > socket gets into the EPOLLERR or EPOLLHUP state. ZeroMQ unregisteres the
> > socket for read, but doesn't actually call EPOLL_CTL_DEL on the fd, so
> epoll
> > just keeps calling back zmq with the same fd. Is this a know bug? I also
> > tried fixing this, but now it crashes in set_pollout periodically and
> > appears to be passing in a garbage fd, so I must have missed something.
>
> Again, if there's any way to reproduce this, that's a good start.
> Otherwise, file an issue and note as much about the problem as you
> can.
>
> -Pieter
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130123/65873a81/attachment.htm>


More information about the zeromq-dev mailing list