[zeromq-dev] Bad file descriptor in rm_fd()

Pieter Hintjens ph at imatix.com
Wed Nov 6 00:10:19 CET 2013


Hi MinRK,

I'll try to reproduce it tomorrow. Any suggestion to the kind of test
case I could make?

-Pieter

On Tue, Nov 5, 2013 at 11:44 PM, MinRK <benjaminrk at gmail.com> wrote:
> Once in a while, when running either the IPython or PyZMQ test suite, I
> still get this error:
>
>     Bad file descriptor (kqueue.cpp:77)
>
> or
>
>     Bad file descriptor (epoll.cpp:81)
>
> Stack trace suggests that this happens when destroying a context:
>
> Thread 0:
> 1   libzmq.3.dylib                 0x000000010f26b170
> zmq::signaler_t::send() + 52
> 2   libzmq.3.dylib                 0x000000010f261b2f
> zmq::object_t::send_stop() + 35
> 3   libzmq.3.dylib                 0x000000010f2534a7 zmq::ctx_t::~ctx_t() +
> 59
> 4   libzmq.3.dylib                 0x000000010f253a29
> zmq::ctx_t::terminate() + 439
> 5   libzmq.3.dylib                 0x000000010f27c071 zmq_ctx_term + 35
>
>
> Thread 6 Crashed:
> 0   libsystem_kernel.dylib         0x00007fff94a4d866 __pthread_kill + 10
> 1   libsystem_pthread.dylib       0x00007fff8cac835c pthread_kill + 92
> 2   libsystem_c.dylib             0x00007fff97570bba abort + 125
> 3   libzmq.3.dylib                 0x000000010f25a9e1 zmq::zmq_abort(char
> const*) + 9
> 4   libzmq.3.dylib                 0x000000010f25d0fe
> zmq::kqueue_t::kevent_delete(int, short) + 142
> 5   libzmq.3.dylib                 0x000000010f25d1b0
> zmq::kqueue_t::rm_fd(void*) + 42
> 6   libzmq.3.dylib                 0x000000010f2687a3
> zmq::reaper_t::process_stop() + 59
> 7   libzmq.3.dylib                 0x000000010f26862b
> zmq::reaper_t::in_event() + 161
> 8   libzmq.3.dylib                 0x000000010f25d40c zmq::kqueue_t::loop()
> + 362
>
>
> I am still seeing this error once in a while with libzmq-master as of today.
> I don't think it's a recent regression.  A minimal test case is difficult,
> since it only seems to raise after at least a hundred tests, and only a
> small fraction of the time even then.  Given that it is always late in the
> process that the assert is hit, I have always assumed that it is FD
> exhaustion that is causing the problem, but I am not actually sure, and I am
> fairly careful about cleaning up sockets.
>
> Properties of the test suite that sees the issue:
>
> - create and destroy many contexts and sockets
> - the previous test's context should always be destroyed before the next
> test starts
> - it is not reliably the same test where the assert is hit
>
> I'm afraid I don't know enough about the internals to really tell what's
> going on here, or figure out why the deleted FD is invalid (maybe it was
> already closed, and the error should be ignored?).
>
> Anyone have insight on what might be causing the problem, or how I might dig
> deeper into more useful information?
>
> -MinRK
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>



-- 
-
Pieter Hintjens
CEO of iMatix.com
Founder of ZeroMQ community
blog: http://hintjens.com



More information about the zeromq-dev mailing list