[zeromq-dev] ROUTER tcp socket stuck in CLOSE_WAIT with large receive queue

Sash Nagarkar sash at dronedeploy.com
Thu Jun 12 17:55:50 CEST 2014

Thanks Pieter!  I'll try that and see if we encounter it again.  Love
the work you guys are doing with ZMQ.

On Thu, Jun 12, 2014 at 7:53 AM, Pieter Hintjens <ph at imatix.com> wrote:
> I've seen something similar (I think) with Zyre, where dealer sockets
> connecting with the same identity do weird things. Try setting
> ZMQ_ROUTER_HANDOVER on the router socket, see if that helps (you'll
> need libzmq master).
> On Thu, Jun 12, 2014 at 4:15 AM, Sash Nagarkar <sash at dronedeploy.com> wrote:
>> Hello ZMQ devs,
>> We're using PyZMQ 14.3.0 and libzmq 4.0.4 with a ROUTER-DEALER pattern
>> for a service we're providing.  Sorry if this is too verbose, and I
>> hope this is the right place to ask the question.
>> TL;DR: ROUTER socket doesn't receive messages from a DEALER even
>> though netstat shows several megabytes in the TCP receive queue
>> (nothing in the send queue).  Other connected DEALERs work fine.
>> The ROUTER socket is running on a server with ample CPU & memory
>> headroom, with several DEALER clients that connect, exchange messages,
>> and can abruptly disconnect repeatedly.  We're exclusively using
>> multipart messages with the first part always being the ZMQ socket
>> identity, which persists across DEALER connect/disconnects.  In other
>> words, each DEALER client uses the same socket identity across many
>> connects and disconnects.
>> Most of the time, things hum along smoothly (several thousand messages
>> exchanged, several dozen connect/disconnects).  However, every once in
>> a rare while, we see that one of the DEALER clients connects and sends
>> messages to the ROUTER that end up never making it to the ROUTER
>> process.  The ROUTER process continues to receive messages from other
>> DEALER clients.
>> Further debugging on the ROUTER server shows one (or more) TCP
>> connections from the client DEALER that are in the CLOSE_WAIT state
>> with several megabytes of data sitting in the receive queue to the
>> ROUTER.  We also see one connection from the client DEALER in the
>> ESTABLISHED state with a receive queue that is growing.
>> It's clear that the DEALER client died abruptly once, but then
>> returned with the same identity and resumed sending messages to the
>> ROUTER.  However, none of the subsequent messages are delivered to the
>> ROUTER process.  Any ideas on why this would be the case?
>> I would have provided a test case, but we aren't able to consistently
>> reproduce the issue.  I've copied the output from netstat (with
>> obfuscated IPs) below, in case it helps.
>> Questions:
>> - What would cause the receive queue to fill up like this on a ROUTER
>> while it continues to receive messages from other clients?  It's clear
>> that the messages are all making it to the ROUTER machine.
>> - Is it safe for DEALER sockets to abruptly disconnect and then reuse
>> their socket identity?
>> - How can we mitigate this situation?  The closest thing I see is
>> ZMQ_LINGER, but that applies only to the outgoing queue and not the
>> incoming one.
>> - Is there anything I could investigate myself to figure out whether
>> this is an issue in PyZMQ vs. libzmq?  Where should I start?
>> Other potentially relevant info:
>> - The ROUTER uses PyZMQ's zmq.Poller() to receive messages from the
>> problem socket and some others.  All other nodes in the system
>> continue to send and receive messages just fine.
>> - The ROUTER's send queues are pretty much empty.
>> - We see the same behavior with libzmq 4.0.4 and libzmq 2.2.x, on Ubuntu 14.04.
>> $ netstat -a
>> Active Internet connections (servers and established)
>> Proto Recv-Q Send-Q Local Address           Foreign Address         State
>> tcp        0      0 *:12501                  *:*                     LISTEN
>> tcp   1816956      0 server-ip.:12501 clientA-ip:42571 CLOSE_WAIT
>> tcp   1551036      0 server-ip.:12501 clientA-ip:42858 CLOSE_WAIT
>> tcp        0      0 server-ip.:12501 clientB-ip:34000 ESTABLISHED
>> tcp   5265541      0 server-ip.:12501 clientA-ip:43469 ESTABLISHED
>> Please let me if further information would help.  Thank you for
>> helping build ZMQ, it's been a huge pleasure to work with so far.
>> Cheers,
>> Sash
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev

More information about the zeromq-dev mailing list