[zeromq-dev] FD resource leak when opening and closing an inproc socket

Itay Chamiel itay.chamiel at orcam.com
Tue Jan 5 16:06:55 CET 2021

Hi Bill,

Thanks for responding. We have by now found a workaround that doesn't
require creating a socket each time, so this issue doesn't affect us
anymore. I did however continue investigating out of curiosity.

After reading the linked thread I see that your bottom-line question there
is how to get process_commands to run on these sockets - and that question
is unanswered. I've tried an approach suggested by one of the other
participants, to call zmq_getsockopt for ZMQ_EVENTS, but this seemed to
have no effect on the outcome.

I have noticed that closing the context does free up the resources and this
was in fact a workaround we used for a while, but then we started getting
occasional segfaults within the context destructor, at which point I
decided that's a bad direction.

I've tried to reproduce that crash in the context of this simple test, and
ran into some more weird behavior. Here is the code:

int main() {
  while (true) {
    printf("context create\n");
    zmq::context_t context;
    const int max_cnt = 900;
    for (int i=0; i<max_cnt; i++) {
      zmq::socket_t* socket = new zmq::socket_t(context, ZMQ_SUB);
      delete socket;

Note the max_cnt constant. If it is set to 508 or lower (on my machine,
yours may vary), this test runs indefinitely. At 1017 or above I get the
"Too many open files" error, as in my original code sample, as expected.
But at any value in between I get this output:
context create
Assertion failed: s (src/ctx.cpp:148)
Aborted (core dumped)

Now, src/ctx.cpp:148 is within the ctx destructor but there is nothing
there that looks like an assertion (I see: "_tag = ZMQ_CTX_TAG_VALUE_BAD;")
so this is where my investigation stops. I'm going to guess that this is
another resource error (it's probably not a coincidence that 508 is half of
the upper limit) and not related to the segfault I saw.

To reiterate, I'm just sharing things I found while poking around, these
issues don't affect us at present.

Best regards,

Itay Chamiel, OrCam

On Mon, Jan 4, 2021 at 6:07 PM Bill Torpey <wallstprog at gmail.com> wrote:

> Hi Itay:
> Take a look at https://github.com/zeromq/libzmq/issues/3186 — it may be
> relevant to the behavior you’re seeing.
> The short version is that process_commands needs to get a chance to run on
> the socket to clean up resources.  If that isn’t done, resources (in this
> case memory, but in your case potentially fd’s) can appear to leak until
> the context is shut down.
> Hope this helps…
> Bill
> On Dec 29, 2020, at 7:12 AM, Itay Chamiel <itay.chamiel at orcam.com> wrote:
> Hi, we have a client thread that is supposed to receive data from a parent
> thread, then disconnect when done. We've noticed that when the socket is
> closed there's a leak of an eventfd (file descriptor), therefore we have a
> leak every time such a client is created and destroyed - even if no data is
> transferred.
> Here is a quick C++ program to reproduce it. I'm running on a Ubuntu 18
> desktop with ZMQ 4.1.6 or 4.3.3. This loop is expected to run forever but
> crashes a little after 1000 iterations due to too many open files.
> #include "zmq.hpp"
> int main() {
>   zmq::context_t context;
>   while(1) {
>     zmq::socket_t* socket = new zmq::socket_t(context, ZMQ_SUB);
>     socket->connect("inproc://some_name");
>     delete socket;
>   }
> }
> The problem does not occur for other connection types (i.e. replace inproc
> with ipc and the problem will not occur).
> In case you want it without the C++ bindings, here is a slightly more
> elaborate C example which also sets LINGER to zero (with no effect) and
> displays the number of FDs in use by the process each iteration.
> #include <zmq.h>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <unistd.h>
> int main() {
>   void* ctx = zmq_ctx_new();
>   for (int i=0; ; i++) {
>     void* zmq_sock = zmq_socket(ctx, ZMQ_SUB);
>     if (!zmq_sock) { printf("fail after %d iterations: %s\n", i,
> zmq_strerror(errno)); exit(-1); }
>     int linger = 0;
>     int rc = zmq_setsockopt(zmq_sock, ZMQ_LINGER, &linger,
> sizeof(linger)); // this doesn't actually help
>     if (rc != 0) exit(-1);
>     rc = zmq_connect(zmq_sock, "inproc://some_name");
>     if (rc != 0) exit(-1);
>     rc = zmq_close(zmq_sock);
>     if (rc != 0) exit(-1);
>     // show the number of used FDs
>     char cmd[100];
>     sprintf(cmd, "ls -l -v /proc/%d/fd | wc -l", (int)getpid());
>     system(cmd);
>     // test is hard to abort without a sleep
>     usleep(100*1000);
>   }
> }
> Thank you,
> Itay Chamiel, OrCam
