[zeromq-dev] Bad file descriptor in rm_fd()

MinRK benjaminrk at gmail.com
Tue Nov 5 23:44:10 CET 2013


Once in a while, when running either the IPython or PyZMQ test suite, I
still get this error:

    Bad file descriptor (kqueue.cpp:77)

or

    Bad file descriptor (epoll.cpp:81)
Stack trace suggests that this happens when destroying a context:

Thread 0:
1   libzmq.3.dylib                 0x000000010f26b170
zmq::signaler_t::send() + 52
2   libzmq.3.dylib                 0x000000010f261b2f
zmq::object_t::send_stop() + 35
3   libzmq.3.dylib                 0x000000010f2534a7 zmq::ctx_t::~ctx_t()
+ 59
4   libzmq.3.dylib                 0x000000010f253a29
zmq::ctx_t::terminate() + 439
5   libzmq.3.dylib                 0x000000010f27c071 zmq_ctx_term + 35


Thread 6 Crashed:
0   libsystem_kernel.dylib         0x00007fff94a4d866 __pthread_kill + 10
1   libsystem_pthread.dylib       0x00007fff8cac835c pthread_kill + 92
2   libsystem_c.dylib             0x00007fff97570bba abort + 125
3   libzmq.3.dylib                 0x000000010f25a9e1 zmq::zmq_abort(char
const*) + 9
4   libzmq.3.dylib                 0x000000010f25d0fe
zmq::kqueue_t::kevent_delete(int, short) + 142
5   libzmq.3.dylib                 0x000000010f25d1b0
zmq::kqueue_t::rm_fd(void*) + 42
6   libzmq.3.dylib                 0x000000010f2687a3
zmq::reaper_t::process_stop() + 59
7   libzmq.3.dylib                 0x000000010f26862b
zmq::reaper_t::in_event() + 161
8   libzmq.3.dylib                 0x000000010f25d40c zmq::kqueue_t::loop()
+ 362


I am still seeing this error once in a while with libzmq-master as of
today. I don't think it's a recent regression.  A minimal test case is
difficult, since it only seems to raise after at least a hundred tests, and
only a small fraction of the time even then.  Given that it is always late
in the process that the assert is hit, I have always assumed that it is FD
exhaustion that is causing the problem, but I am not actually sure, and I
am fairly careful about cleaning up sockets.

Properties of the test suite that sees the issue:

- create and destroy many contexts and sockets
- the previous test's context should always be destroyed before the next
test starts
- it is not reliably the same test where the assert is hit

I'm afraid I don't know enough about the internals to really tell what's
going on here, or figure out why the deleted FD is invalid (maybe it was
already closed, and the error should be ignored?).

Anyone have insight on what might be causing the problem, or how I might
dig deeper into more useful information?

-MinRK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20131105/63072f9f/attachment.html>


More information about the zeromq-dev mailing list