[zeromq-dev] FD resource leak when opening and closing an inproc socket

Bill Torpey wallstprog at gmail.com
Tue Jan 5 16:37:38 CET 2021


Hi Itay:

Pls. see embedded comments below …

Best Regards,

Bill

P.S.  Good luck with your product!  My mom had macular degeneration and it made her later years really difficult.  Something like an OrCam would have been a blessing!

> On Jan 5, 2021, at 10:06 AM, Itay Chamiel <itay.chamiel at orcam.com> wrote:
> 
> Hi Bill,
> 
> Thanks for responding. We have by now found a workaround that doesn't require creating a socket each time, so this issue doesn't affect us anymore. I did however continue investigating out of curiosity.

Fantastic!  That’s the only way things get better.


> 
> After reading the linked thread I see that your bottom-line question there is how to get process_commands to run on these sockets - and that question is unanswered. I've tried an approach suggested by one of the other participants, to call zmq_getsockopt for ZMQ_EVENTS, but this seemed to have no effect on the outcome.

I’ve noticed the same thing — ZMQ_EVENTS doesn’t seem to do the trick, but zmq_poll(..., ZMQ_POLLIN | ZMQ_POLLOUT) does seem to trigger the required cleanup.  I should really dig down and see if I can figure out why.


> 
> I have noticed that closing the context does free up the resources and this was in fact a workaround we used for a while, but then we started getting occasional segfaults within the context destructor, at which point I decided that's a bad direction.

Just so I understand, you’re seeing SEGV in the dtor, not the ctor — correct?


> 
> I've tried to reproduce that crash in the context of this simple test, and ran into some more weird behavior. Here is the code:
> 
> int main() {
>   while (true) {
>     printf("context create\n");
>     zmq::context_t context;
>     const int max_cnt = 900;
>     for (int i=0; i<max_cnt; i++) {
>       zmq::socket_t* socket = new zmq::socket_t(context, ZMQ_SUB);
>       socket->connect("inproc://some_name");
>       delete socket;
>     }
>   }
> }
> 
> Note the max_cnt constant. If it is set to 508 or lower (on my machine, yours may vary), this test runs indefinitely. At 1017 or above I get the "Too many open files" error, as in my original code sample, as expected. But at any value in between I get this output:
> context create
> Assertion failed: s (src/ctx.cpp:148)
> Aborted (core dumped)
> 
> Now, src/ctx.cpp:148 is within the ctx destructor but there is nothing there that looks like an assertion (I see: "_tag = ZMQ_CTX_TAG_VALUE_BAD;") so this is where my investigation stops. I'm going to guess that this is another resource error (it's probably not a coincidence that 508 is half of the upper limit) and not related to the segfault I saw.

I’m guessing that the assert is actually the initial "zmq_assert (_sockets.empty ());”, and that the line number 148 is being somehow picked up because it’s the last line before the return?

> 
> To reiterate, I'm just sharing things I found while poking around, these issues don't affect us at present.

That’s great, and thanks for that!  

I’ll give the above a try and report back what I find out.

A suggestion in future is to use the GitHub issues instead of email.  At least in my experience, that tends to get more visibility, and is also easier to refer back to than emails.  


> 
> Best regards,
> 
> Itay Chamiel, OrCam
> 
> 
> On Mon, Jan 4, 2021 at 6:07 PM Bill Torpey <wallstprog at gmail.com <mailto:wallstprog at gmail.com>> wrote:
> Hi Itay:
> 
> Take a look at https://github.com/zeromq/libzmq/issues/3186 <https://github.com/zeromq/libzmq/issues/3186> — it may be relevant to the behavior you’re seeing.
> 
> The short version is that process_commands needs to get a chance to run on the socket to clean up resources.  If that isn’t done, resources (in this case memory, but in your case potentially fd’s) can appear to leak until the context is shut down.
> 
> Hope this helps…
> 
> Bill
> 
>> On Dec 29, 2020, at 7:12 AM, Itay Chamiel <itay.chamiel at orcam.com <mailto:itay.chamiel at orcam.com>> wrote:
>> 
>> Hi, we have a client thread that is supposed to receive data from a parent thread, then disconnect when done. We've noticed that when the socket is closed there's a leak of an eventfd (file descriptor), therefore we have a leak every time such a client is created and destroyed - even if no data is transferred.
>> 
>> Here is a quick C++ program to reproduce it. I'm running on a Ubuntu 18 desktop with ZMQ 4.1.6 or 4.3.3. This loop is expected to run forever but crashes a little after 1000 iterations due to too many open files.
>> 
>> #include "zmq.hpp"
>> 
>> int main() {
>>   zmq::context_t context;
>>   while(1) {
>>     zmq::socket_t* socket = new zmq::socket_t(context, ZMQ_SUB);
>>     socket->connect("inproc://some_name <>");
>>     delete socket;
>>   }
>> }
>> 
>> The problem does not occur for other connection types (i.e. replace inproc with ipc and the problem will not occur).
>> In case you want it without the C++ bindings, here is a slightly more elaborate C example which also sets LINGER to zero (with no effect) and displays the number of FDs in use by the process each iteration.
>> 
>> #include <zmq.h>
>> #include <stdlib.h>
>> #include <sys/types.h>
>> #include <unistd.h>
>> 
>> int main() {
>>   void* ctx = zmq_ctx_new();
>>   for (int i=0; ; i++) {
>>     void* zmq_sock = zmq_socket(ctx, ZMQ_SUB);
>>     if (!zmq_sock) { printf("fail after %d iterations: %s\n", i, zmq_strerror(errno)); exit(-1); }
>>     int linger = 0;
>>     int rc = zmq_setsockopt(zmq_sock, ZMQ_LINGER, &linger, sizeof(linger)); // this doesn't actually help
>>     if (rc != 0) exit(-1);
>>     rc = zmq_connect(zmq_sock, "inproc://some_name <>");
>>     if (rc != 0) exit(-1);
>>     rc = zmq_close(zmq_sock);
>>     if (rc != 0) exit(-1);
>>     // show the number of used FDs
>>     char cmd[100];
>>     sprintf(cmd, "ls -l -v /proc/%d/fd | wc -l", (int)getpid());
>>     system(cmd);
>>     // test is hard to abort without a sleep
>>     usleep(100*1000);
>>   }
>> }
>> 
>> Thank you,
>> 
>> Itay Chamiel, OrCam
>> 
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
> 
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20210105/79d3e5b7/attachment.htm>


More information about the zeromq-dev mailing list