[zeromq-dev] Even more fun with accept

Martin Sustrik sustrik at fastmq.com
Thu Apr 30 10:16:44 CEST 2009


Hi guys,

The fun continues:

If the socket cannot be accepted on the listener side of the connection, 
the connection hangs in ESTABLISHED state, meaning that the connecting 
party believes that the connection was opened successfully. The 
consequence is that the connecting application merrily runs with no idea 
that there have been a problem connecting to the other party.

This being the case, any solution based on waiting a while or not 
polling on the listener socket won't help. So, AFAIU, the possibilities are:

1. Increase number of allowed fds if needed (Erich)
2. Restart the app (Steven)
3. Set listen backlog to zero so that there are no penging connection 
(Dhammika)

The problem with 1. is - IMHO - that the application overrides admin's 
decision on how many socket are allowed on the system. This is something 
you don't want in enterprise environment.

Solution 3. is pretty consistent, however, the downside is that it's not 
able to accept two connections that have been accidentally created at 
the same time - one of them gets killed. This may be a problem with 
C10K-style environments.

Solution 2. seems the best to me, however, it's rather drastic, killing 
  the functional connections along with hung ones. The more friendly 
option would be to close and reopen the listening socket when 
EMFILE/ENFILE is encountered. This would drop the listening backlog 
causing the hung clients to fail and try to reconnect after a while.

Comments?
Martin

Dhammika Pathirana wrote:
> Hi,
> 
> On EMFILE we can temporarily suspend listen socket from poller.
> Optionally we may want to keep an atomic counter for active sockets,
> and make fd limit configurable with setrlimit.
> 
> Set SO_LINGER timeout on accepted sockets to recover closed sockets
> without waiting in CLOSED_WAIT state.
> Set listen backlog to 1 so that clients will get ECONNREFUSED without
> hanging in connect.
> 
> ENFILE is more intricate though, I think the best option here is just
> to wait few microseconds and retry.
> We can also use edge triggered polling on supported platforms, but
> that's way more complicated.
> 
> 
> Dhammika
> 
> 
> 
> On Wed, Apr 29, 2009 at 9:27 AM, Martin Sustrik <sustrik at fastmq.com> wrote:
>> Hi,
>>
>> I was just told that when EMFILE or ENFILE error (the socket limit is
>> exceeded) is returned from accept the connection isn't purged from the
>> listening queue.
>>
>> The consequence is that polling on the listening socket signalises that
>> there's a pending connection, however, there's no way to accept it unless
>> some sockets are closed in the meantime.
>>
>> Such behaviour results in a busy loop desperately trying to accept the
>> connection, ultimately getting the CPU load to 100%.
>>
>> Any tips how to solve the problem?
>> Martin
>>




More information about the zeromq-dev mailing list