[zeromq-dev] [bug+workaround] socket_t not closed properly, still used

Thijs Terlouw thijsterlouw at gmail.com
Fri Jan 21 13:52:08 CET 2011


On Fri, Jan 21, 2011 at 8:13 PM, Martin Sustrik <sustrik at 250bpm.com> wrote:

>> Perhaps this workaround it not ideal for when you want to have your
>> old socket linger longer, but for me it's important to close it.
>
> Yes. The problem with the workaround is that the creation of next socket
> could then block for indefinite time (until all the pending messages are
> sent).

Correct, but for my application this was the best workaround I could
quickly implement, so that other developers are not affected by this
bug.

>> The consequence of having two sockets open, is that the when you load
>> balance between several sockets, the zombie socket also seems to get
>> used. So I suspect the *real solution* will be to make sure the zombie
>> socket doesn't get used anymore. For me removing the zombie socket is
>> a good workaround for now.
>
> So, the socket is zmq_close()'d but it takes some time to actually destruct
> it. Untill then the connection to the peer still exists and the peer still
> sends messages to the closed socket. Do I understand it right?
>

Yes, according to my debugging there are two problems:
1) the socket is not actually destructed at all (between each
iteration was 10 seconds, that should be more than enough)? I believe
it's only destructed when process_messages() has a chance to run,
which is never; unless you run for example dezombify via creating a
new socket. You could verify this by creating a socket in an
application, sending 1 message, receiving reply and then destruct it
and wait for example 10 seconds and see if the TCP connection still
exists. (lsof for example).

2) I didn't debug the 2nd part detailedly, but I suspect indeed that
untill when the socket_t is destructed, the peer still sends messages
to it - which will then all fail (no reply). That's at least the
symptom I was seeing in my application. This part is the real nasty
bug. A socket that stays around longer just takes up a TCP connection,
but would not be a major concern (I wouldn't even have noticed it).

Hope you will be able to tell where to look for the real solution :)
ZeroMQ internally is really really really complicated! Partially
because of all those abstractions, owner ship trees, messages etc.
Several times I thought ZeroMQ internals were inspired by Erlang :-)

Thijs



More information about the zeromq-dev mailing list