[zeromq-dev] Cleaning up file descriptors for dead router

Maurice Barnum msb at yahoo-inc.com
Mon Jun 29 21:05:50 CEST 2015

I've been worried about a similar problem:  how to release resources tied up by a client
that has gone away.  zproto provides a good example of what to do for application-level
resources, but queued messages are still stuck.  Using credits will limit the flow of

messages, but at high request rates, a few thousand buggy client deployments can be
very disruptive.

Has anyone ever thought about implementing an API that would let an application disconnect
a specific peer from a socket?  I was thinking about adding an API that would disconnect

a peer from a router, identified by the identity.  Maybe a more general approach would be
to add an API to iterate connected peers and allow a disconnect based upon address,
but I don't have such a need and so haven't thought too much about it until right now.

On Monday, June 29, 2015 11:26 AM, Marcin Romaszewicz <marcin at brkt.com> wrote:

Hi Jonathan,

Your heartbeat code does indeed work in my little test, but I don't know why it didn't work in the wild for me.

Your code, though, gave me an idea to fix my problem slightly differently on top of ZMQ 4.1.2. I already have heartbeats going back and forth, and they propagate some peer information, so I have to send them irrespective of whether your code sends ZMQ-internal heartbeats. I'm going to do something similar in the stream engine, where if the tcp send returns a size of 0 and the reason is that the send would block or fail, I'll start a timer, then cancel if if we ever have a subsequent successful send or receive something. If the timer goes off, we disconnect. This should fix my problem without two layers of heartbeats.

Once 4.2.0 is stable and tested, I'll move to using your heartbeat stuff and remove our own heartbeats.

-- Marcin

On Sat, Jun 27, 2015 at 9:06 AM, Jonathan Reams <jbreams at gmail.com> wrote:

Hi Marcin,
>I tried running your test case with the new heartbeats turned on and I saw what I think should be the correct behavior. I set the heartbeat interval, timeout, and TTL to 500 ms, and less than a second after setting iptables to DROP, all the sockets on the peer side went from ESTABLISHED to SYN_SENT, indicating that they were trying to reconnect, and all the ESTABLISHED sockets on the router side were closed. After flushing the INPUT iptables chain, the peers eventually recovered. I put my updated copy of your test script here https://gist.github.com/jbreams/7f507beff87987afad98. I haven't tried this with 4.2.0 talking to 4.1.2 though, although in your configuration I think it would do almost the right thing - I'd expect the router side to work fine and the peers to never close their sockets.
>On Fri, Jun 26, 2015 at 4:58 PM, Marcin Romaszewicz <marcin at brkt.com> wrote: 
>Hi All,
>>I've gota trivial bit of code to reproduce this issue on a single host
>>using iptables to simulate network partition.
>>The file has comments on how to run the executable, but the short version
>>is that you start a ZMQ_ROUTER listener which accepts connections from
>>other peers, and remembers their identities and pings them every 5 seconds.
>>Then, you start a number of peers which connect to this router and start
>>pinging it every few seconds.
>>Once you use the iptables command (also in the comments in the file), the
>>router can't ping the peers, and the peers can't ping the router. The file
>>descriptors and connections remain open forever on both sides.
>>Furthermore, when you undo the iptables block, the connections never come
>zeromq-dev mailing list
>zeromq-dev at lists.zeromq.org

zeromq-dev mailing list
zeromq-dev at lists.zeromq.org

More information about the zeromq-dev mailing list