[zeromq-dev] Cleaning up file descriptors for dead router peers
Francis Le Bourse
zno-reply-francis.lebourse at sfr-sh.fr
Wed Jun 24 17:02:23 CEST 2015
On 6/24/2015 11:47 AM, Pieter Hintjens wrote:
> Do you think there's any way to reproduce this in the lab, e.g.
> killing a peer before it can shut down TCP properly?
Yes you can use iptables for that purpose. For example (on the test
server machine):
iptables -A INPUT -p tcp --dport=your_router_port -j DROP
iptables -A OUTPUT -p tcp --sport=your_router_port -j DROP
will block the flow to/from the router socket.
>
> On Tue, Jun 23, 2015 at 10:08 PM, Marcin Romaszewicz <marcin at brkt.com> wrote:
>> Hi All,
>>
>> I've got an issue with ZMQ_ROUTER sockets which I'm having a hard time
>> working around, and I'd love some advice, but I suspect the answer is that
>> what I want to do isn't possible.
>>
>> Say I have a router socket listening on a port, and I have peers connecting
>> and disconnecting randomly over TCP. These peers have random identities for
>> all intents and purposes.
>>
>> Most of the time, a peer will disconnect "cleanly", meaning the TCP
>> connection is terminated via FIN or RST packets, ZMQ cleans up the file
>> descriptor.
>>
>> However, some of the time, my peer will die silently, effectively due to
>> network outage or power outage or something.
>>
>> In these cases, the router socket keeps the file descriptor around forever.
>> I know that the peer is dead because all my peers heartbeat to each other,
>> and the heartbeats have gone away. I thought that trying to send some data
>> to a dead peer would tear down that connection, since the underlying TCP
>> socket would eventually start erroring, but it doesn't, zmq must be dropping
>> my packet before sending it to the underlying socket.
>>
>> The socket monitor tells me that someone has connected to the router socket
>> on on its bound port with a specific file descriptor, but I've got so many
>> of these coming in that I can't associate a specific file descriptor with a
>> specific peer.
>>
>> TCP keep-alives don't work all that well in raising errors in a dead
>> connection.
>>
>> What I know on the app side due to my heartbeats is that peer XYZ is dead.
>> I'd like to tell the router socket to close the underlying file descriptor.
>> What I know via the monitor is that I have a bunch of file descriptors open,
>> but I can't map them to peers. If I could, I'd just call os.close() on that
>> file descriptor and hopefully ZMQ would handle this gracefully.
>>
>> Eventually, in a few hours of uptime, my process hits the os file descriptor
>> limit, and stops receiving new connections on the zeromq level. I can have
>> the process quit when it detects this, but that forces all the functioning
>> peers to reconnect and re-do some work, so I'd like to avoid it.
>>
>> I scanned the previous discussions about it, and there has been mention of
>> exposing this somehow, but I don't see anything along these lines in the
>> latest API. (looking at 4.1.2 release).
>>
>> Any suggestions on how I could work around this?
>>
>> I'm thinking of extending the socket monitor to have a new event type, like
>> ZMQ_PEER_CONNECT/DISCONNECT which passes back the peer ID and file
>> descriptor, but I've not gone through the zmq code enough yet to know how
>> much work this would be.
>>
>> Thanks in advance,
>> -- Marcin
>>
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
More information about the zeromq-dev
mailing list