[zeromq-dev] Leaked/stale TCP connections

Jonathan Reams jbreams at gmail.com
Thu Aug 28 22:46:45 CEST 2014


I'm working on a project that uses ZeroMQ to distribute monitoring checks
from a central monitoring server out to all the hosts being monitored. Each
monitoring client has a SUB (to receive check jobs) and PUSH (to send check
results) connection back to the monitoring server, and I'm seeing sporadic
failures where the TCP connection for the PUSH socket go stale, even though
both sides show the connection as ESTABLISHED after the server has reloaded
(when the server reloads, it closes all its ZeroMQ sockets, terminates the
context, and recreates them, but the process doesn't exit).

While the connection is stale, the TCP send queue size increases as the
client sends data. Eventually the client's TCP connection times out, and
re-establishes the connection, and everything works fine, except that the
server process shows two ESTABLISHED connections even though only one of
them is active. It's as though the connection that's "leaked" is open, but
isn't attached to anything in zeromq.

Below is a log of what the client sees right after restart:
Aug 28 18:46:11 clienthost mqexec: Socket event on
tcp://monitoringserver:3232: broken session (fd: 17)
Aug 28 18:46:11 clienthost mqexec: Socket event on
tcp://monitoringserver:3233: broken session (fd: 18)
Aug 28 18:46:11 clienthost mqexec: Socket event on
tcp://monitoringserver:3232: asynchronous connect / reconnection attempt
(ivl: 1981)
Aug 28 18:46:11 clienthost mqexec: Socket event on
tcp://monitoringserver:3233: asynchronous connect / reconnection attempt
(ivl: 1788)
Aug 28 18:46:13 clienthost mqexec: Socket event on
tcp://monitoringserver:3233: connection established (fd: 17)
Aug 28 18:46:13 clienthost mqexec: Socket event on
tcp://monitoringserver:3232: connection established (fd: 18)

At this point the tcp://monitoringserver:3233 connection shows as
established on both ends, but no traffic is flowing and the send queue is
getting larger:
$ netstat -tnp | grep 3233
Proto Recv-Q Send-Q Local Address               Foreign Address
State       PID/Program name
tcp        0  57848 clienthost:43781           monitoringserver:3233
  ESTABLISHED 7149/mqexec

On the nagios server, it also shows up as established, but not much is
going on.
$ netstat -tnp | grep 43781
Proto Recv-Q Send-Q Local Address               Foreign Address
State       PID/Program name
tcp        0      0  monitoringserver:3233         clienthost:43781
 ESTABLISHED 26949/nagios

Eventually the TCP socket times out (4 minutes later), and the client
re-establishes the connection
Aug 28 19:01:57 clienthost mqexec: Socket event on
tcp://monitoringserver:3233: broken session (fd: 17)
Aug 28 19:01:57 clienthost mqexec: Socket event on
tcp://monitoringserver:3233: asynchronous connect / reconnection attempt
(ivl: 1180)
Aug 28 19:01:58 clienthost mqexec: Socket event on
tcp://monitoringserver:3233: connection established (fd: 17)

The client now shows two ESTABLISHED tcp connections, and traffic flows
normally. On the server side, however, we still see the old connection as
established:
$ netstat -tnp |grep clienthost
tcp        0      0 monitoringserver:3233         clienthost:58860
 ESTABLISHED 26949/nagios
tcp        0      0 monitoringserver:3232         clienthost:58667
 ESTABLISHED 26949/nagios
tcp        0      0 monitoringserver:3233         clienthost:43781
 ESTABLISHED 26949/nagios

Interestingly, the PUB socket, that the client also connects to never
exhibits this behavior. On the nagios side, the PULL socket is being polled
by registering the ZMQ_FD file descriptor with an I/O event loop.
Immediately after putting the file descriptor in the event loop, I
explicitly read any pending events to make sure I haven't missed anything.

I was wondering/hoping that someone else may have seen this kind of
behavior or had any thoughts on how this could happen.

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20140828/2862cbdd/attachment.html>


More information about the zeromq-dev mailing list