[zeromq-dev] Not all sockets closed on exec
Peter J. Holzer
hjp-zeromq at hjp.at
Fri Jul 29 00:10:47 CEST 2016
I have an application (written in Perl) using ZeroMQ. A central
component is a broker process (using ROUTER sockets) which forks/execs
off worker processes as needed. So far so good.
But every once in a while connecting to the broker takes 127 seconds.
Upon investigation I found that the linux kernel was dropping SYN
packets because there was already an existing socket with the same ports
in CLOSE_WAIT state (after 127 second ZeroMQ on the client gives up and
connects again, which (usually) works immediately).
That socket was kept open by a worker:
worker 31646 wdsro 21u IPv4 1389089750 0t0 TCP localhost:21887->localhost:42885 (CLOSE_WAIT)
But 21887 is the port number used by the broker. How did this socket get
to the worker?
I think I now know what happened:
ZeroMQ sets SOCK_CLOEXEC when creating the listen socket. It was
probably assumed that the socket returned by accept(2) inherits this
flag. But that doesn't seem to be the case:
# cat /proc/17444/fdinfo/20
# cat /proc/17444/fdinfo/22
Fd 20 is the listen socket here: It has the O_CLOEXEC flag (02000000)
set. Fd 22 is an established socket on the same port: It doesn't have
the flag set.
So when the process forks before the connection is closed the open
socket is inherited by the child and will survive the exec. The child
process will never close the socket (because it doesn't know about it),
so it will stay in CLOSE_WAIT state.
Linux provides the accept4 system call, which takes an extra flags
parameter. Therefore I think zmq::tcp_listener_t::accept should handle
accept similar to how zmq::open_socket handles connect:
If the OS provides accept4, use it with SOCK_CLOEXEC. Otherwise call
fcntl (fd, F_SETFD, FD_CLOEXEC) immediately afterwards.
_ | Peter J. Holzer | I want to forget all about both belts and
|_|_) | | suspenders; instead, I want to buy pants
| | | hjp at hjp.at | that actually fit.
__/ | http://www.hjp.at/ | -- http://noncombatant.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 819 bytes
Desc: Digital signature
More information about the zeromq-dev