[zeromq-dev] Even more fun with accept
Martin Sustrik
sustrik at fastmq.com
Thu Apr 30 10:16:44 CEST 2009
Hi guys,
The fun continues:
If the socket cannot be accepted on the listener side of the connection,
the connection hangs in ESTABLISHED state, meaning that the connecting
party believes that the connection was opened successfully. The
consequence is that the connecting application merrily runs with no idea
that there have been a problem connecting to the other party.
This being the case, any solution based on waiting a while or not
polling on the listener socket won't help. So, AFAIU, the possibilities are:
1. Increase number of allowed fds if needed (Erich)
2. Restart the app (Steven)
3. Set listen backlog to zero so that there are no penging connection
(Dhammika)
The problem with 1. is - IMHO - that the application overrides admin's
decision on how many socket are allowed on the system. This is something
you don't want in enterprise environment.
Solution 3. is pretty consistent, however, the downside is that it's not
able to accept two connections that have been accidentally created at
the same time - one of them gets killed. This may be a problem with
C10K-style environments.
Solution 2. seems the best to me, however, it's rather drastic, killing
the functional connections along with hung ones. The more friendly
option would be to close and reopen the listening socket when
EMFILE/ENFILE is encountered. This would drop the listening backlog
causing the hung clients to fail and try to reconnect after a while.
Comments?
Martin
Dhammika Pathirana wrote:
> Hi,
>
> On EMFILE we can temporarily suspend listen socket from poller.
> Optionally we may want to keep an atomic counter for active sockets,
> and make fd limit configurable with setrlimit.
>
> Set SO_LINGER timeout on accepted sockets to recover closed sockets
> without waiting in CLOSED_WAIT state.
> Set listen backlog to 1 so that clients will get ECONNREFUSED without
> hanging in connect.
>
> ENFILE is more intricate though, I think the best option here is just
> to wait few microseconds and retry.
> We can also use edge triggered polling on supported platforms, but
> that's way more complicated.
>
>
> Dhammika
>
>
>
> On Wed, Apr 29, 2009 at 9:27 AM, Martin Sustrik <sustrik at fastmq.com> wrote:
>> Hi,
>>
>> I was just told that when EMFILE or ENFILE error (the socket limit is
>> exceeded) is returned from accept the connection isn't purged from the
>> listening queue.
>>
>> The consequence is that polling on the listening socket signalises that
>> there's a pending connection, however, there's no way to accept it unless
>> some sockets are closed in the meantime.
>>
>> Such behaviour results in a busy loop desperately trying to accept the
>> connection, ultimately getting the CPU load to 100%.
>>
>> Any tips how to solve the problem?
>> Martin
>>
More information about the zeromq-dev
mailing list