[zeromq-dev] Zmq I/O thread abort() due to bad file descriptor (EBADF)

Ottawa Guy ottawaguy81 at yahoo.com
Thu Apr 15 04:19:56 CEST 2021


Hi,
I am using zeromq 4.3.1. In our design micro-services sends periodic heart-beat to its peers(ROUTER->DEALER model).  ZMQ socket options are set with ZMQ_IMMEDIATE and  ZMQ_SENDTIMEO. This makes the send operation non-blocking when the peer is not up. We are seeing cases where zmq I/O thread crashes(abort) with "BAD file descriptor". It only happens for a peer which is not reachable. It aborts due to "EBADF" epoll_ctl() for  EPOLL_CTL_DEL/ EPOLL_CTL_ADD. Our application only uses zmq socket, it doesn't use ZMQ_FD. I am not sure how there could be any race condition. It seems the socket file descriptor gets closed after epoll_wait () event.  The problem is rare but does happen.  I don't have any recipe to reproduce the problem.  There is no issue with peers that are reachable.   
Any pointer will be helpful.                     Thanks                               Hadi- 



gdb) bt#0  0x00003fff7cdee530 in __libc_signal_restore_set (set=0x3fff712e8040) at ../sysdeps/unix/sysv/linux/internal-signals.h:84#1  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:48#2  0x00003fff7cdd4648 in __GI_abort () at abort.c:79#3  0x00003fff7c971818 in zmq::zmq_abort (errmsg_=<optimized out>) at /usr/src/debug/zeromq/4.3.1-r0/zeromq-4.3.1/src/err.cpp:88#4  0x00003fff7c970d88 in zmq::epoll_t::add_fd (this=0x104797d0, fd_=<optimized out>, events_=<optimized out>) at /usr/src/debug/zeromq/4.3.1-r0/zeromq-4.3.1/src/epoll.cpp:100#5  0x00003fff7c972438 in zmq::io_object_t::add_fd (this=<optimized out>, fd_=<optimized out>) at /usr/src/debug/zeromq/4.3.1-r0/zeromq-4.3.1/src/io_object.cpp:65#6  0x00003fff7c9b1e98 in zmq::tcp_connecter_t::start_connecting (this=0x3fff385bfe70) at /usr/src/debug/zeromq/4.3.1-r0/zeromq-4.3.1/src/tcp_connecter.cpp:203#7  zmq::tcp_connecter_t::start_connecting (this=0x3fff385bfe70) at /usr/src/debug/zeromq/4.3.1-r0/zeromq-4.3.1/src/tcp_connecter.cpp:190#8  0x00003fff7c9b1fe4 in zmq::tcp_connecter_t::timer_event (this=<optimized out>, id_=<optimized out>) at /usr/src/debug/zeromq/4.3.1-r0/zeromq-4.3.1/src/tcp_connecter.cpp:186#9  0x00003fff7c98dad0 in zmq::poller_base_t::execute_timers (this=0x104797d0) at /usr/src/debug/zeromq/4.3.1-r0/zeromq-4.3.1/src/poller_base.cpp:103#10 0x00003fff7c9709c4 in zmq::epoll_t::loop (this=0x104797d0) at /usr/src/debug/zeromq/4.3.1-r0/zeromq-4.3.1/src/epoll.cpp:173#11 0x00003fff7c98d3ac in zmq::worker_poller_base_t::worker_routine (arg_=<optimized out>) at /usr/src/debug/zeromq/4.3.1-r0/zeromq-4.3.1/src/poller_base.cpp:139#12 0x00003fff7c9b3658 in thread_routine (arg_=0x10479828) at /usr/src/debug/zeromq/4.3.1-r0/zeromq-4.3.1/src/thread.cpp:182#13 0x00003fff7cfabb14 in start_thread (arg=0x0) at pthread_create.c:486#14 0x00003fff7cec72e8 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82


#0  0x00003fffb323e530 in __libc_signal_restore_set (set=0x3fff6afe7060) at ../sysdeps/unix/sysv/linux/internal-signals.h:84#1  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:48#2  0x00003fffb3224648 in __GI_abort () at abort.c:79#3  0x00003fffb2dc1818 in .zmq::zmq_abort(char const*) () from /usr/lib64/libzmq.so.5#4  0x00003fffb2dc155c in .zmq::epoll_t::rm_fd(void*) () from /usr/lib64/libzmq.so.5#5  0x00003fffb2dc2474 in .zmq::io_object_t::rm_fd(void*) () from /usr/lib64/libzmq.so.5#6  0x00003fffb2e011e0 in .zmq::tcp_connecter_t::rm_handle() () from /usr/lib64/libzmq.so.5#7  0x00003fffb2e01c3c in .zmq::tcp_connecter_t::out_event() () from /usr/lib64/libzmq.so.5#8  0x00003fffb2e00cbc in .zmq::tcp_connecter_t::in_event() () from /usr/lib64/libzmq.so.5#9  0x00003fffb2dc0a9c in ?? () from /usr/lib64/libzmq.so.5#10 0x00003fffb2ddd3ac in .zmq::worker_poller_base_t::worker_routine(void*) () from /usr/lib64/libzmq.so.5#11 0x00003fffb2e03658 in ?? () from /usr/lib64/libzmq.so.5#12 0x00003fffb33fbb14 in start_thread (arg=0x0) at pthread_create.c:486#13 0x00003fffb33172e8 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82*


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20210415/b903871e/attachment.htm>


More information about the zeromq-dev mailing list