[zeromq-dev] What is the canonical handling of zeromq sockets when fork+exec?

zmqdev zmqdev at amitego.com
Fri Nov 25 10:37:24 CET 2016


* Background

I have a service that starts workers on demand with fork+exec.
The requests arrive over zeromq sockets.

After the fork, before the exec, I close all file descriptors > 2, 
keeping only stdin/out/err. I then exec the requested program.


* Problem

It works. Except that I get some rare core dumps (of the service) with 
the following assertion failure:

	Bad file descriptor (src/epoll.cpp:90)

and the backtrace:

     #0  0xf77f5430 in __kernel_vsyscall ()
     #1  0xf743f1f7 in raise () from /lib/libc.so.6
     #2  0xf7440a33 in abort () from /lib/libc.so.6
     #3  0xf7067134 in zmq::zmq_abort(char const*) () from $LIBS/libzmq.so.5
     #4  0xf7065e6c in zmq::epoll_t::rm_fd(void*) () from $LIBS/libzmq.so.5
     #5  0xf7068823 in zmq::io_object_t::rm_fd(void*) () from 
$LIBS/libzmq.so.5
     #6  0xf70958af in zmq::stream_engine_t::unplug() () from 
$LIBS/libzmq.so.5
     #7  0xf7098711 in 
zmq::stream_engine_t::error(zmq::stream_engine_t::error_reason_t) () 
from $LIBS/libzmq.so.5
     #8  0xf7098867 in zmq::stream_engine_t::timer_event(int) () from 
$LIBS/libzmq.so.5
     #9  0xf707f972 in zmq::poller_base_t::execute_timers() () from 
$LIBS/libzmq.so.5
     #10 0xf7066209 in zmq::epoll_t::loop() () from $LIBS/libzmq.so.5
     #11 0xf7066467 in zmq::epoll_t::worker_routine(void*) () from 
$LIBS/libzmq.so.5
     #12 0xf709d67e in thread_routine () from $LIBS/libzmq.so.5
     #13 0xf7619b2c in start_thread () from /lib/libpthread.so.0
     #14 0xf750808e in clone () from /lib/libc.so.6

This is with zeromq-4.1.4 on RHEL 7.3 x86_64.

So I wonder: is there some interaction between parent and child?


* Documentation

The Guide and the FAQ do not address explicitly the fork+exec point.

The question has been asked several times on the mailing list in various 
forms, without a definitive answer (for dummies like me at least).


* Questions:

Do I need to zmq_close the sockets in the child?
Or is zmq_term in the child enough?
Does closing the file descriptors in the child cause problems in the parent?

What is the correct way to handle this?





More information about the zeromq-dev mailing list