[zeromq-dev] Not all sockets closed on exec

Peter J. Holzer hjp-zeromq at hjp.at
Fri Jul 29 00:10:47 CEST 2016


Hello,

I have an application (written in Perl) using ZeroMQ. A central
component is a broker process (using ROUTER sockets) which forks/execs
off worker processes as needed. So far so good.

But every once in a while connecting to the broker takes 127 seconds.
Upon investigation I found that the linux kernel was dropping SYN
packets because there was already an existing socket with the same ports
in CLOSE_WAIT state (after 127 second ZeroMQ on the client gives up and
connects again, which (usually) works immediately).
That socket was kept open by a worker:

worker  31646 wdsro   21u  IPv4         1389089750      0t0        TCP localhost:21887->localhost:42885 (CLOSE_WAIT)

But 21887 is the port number used by the broker. How did this socket get
to the worker?

I think I now know what happened:

ZeroMQ sets SOCK_CLOEXEC when creating the listen socket. It was
probably assumed that the socket returned by accept(2) inherits this
flag. But that doesn't seem to be the case:

# cat /proc/17444/fdinfo/20
pos:    0
flags:  02000002
mnt_id: 7
# cat /proc/17444/fdinfo/22
pos:    0
flags:  04002
mnt_id: 7

Fd 20 is the listen socket here: It has the O_CLOEXEC flag (02000000)
set. Fd 22 is an established socket on the same port: It doesn't have
the flag set.

So when the process forks before the connection is closed the open
socket is inherited by the child and will survive the exec. The child
process will never close the socket (because it doesn't know about it),
so it will stay in CLOSE_WAIT state.

Linux provides the accept4 system call, which takes an extra flags
parameter. Therefore I think zmq::tcp_listener_t::accept should handle
accept similar to how zmq::open_socket handles connect:

If the OS provides accept4, use it with SOCK_CLOEXEC. Otherwise call
fcntl (fd, F_SETFD, FD_CLOEXEC) immediately afterwards.

        hp

-- 
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants 
| |   | hjp at hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20160729/7ce01793/attachment.sig>


More information about the zeromq-dev mailing list