[zeromq-dev] [bug+workaround] socket_t not closed properly, still used

Martin Sustrik sustrik at 250bpm.com
Fri Jan 21 14:43:46 CET 2011


Thijs,

>> So, the socket is zmq_close()'d but it takes some time to actually destruct
>> it. Untill then the connection to the peer still exists and the peer still
>> sends messages to the closed socket. Do I understand it right?
>>
> Yes, according to my debugging there are two problems:
> 1) the socket is not actually destructed at all (between each
> iteration was 10 seconds, that should be more than enough)? I believe
> it's only destructed when process_messages() has a chance to run,
> which is never; unless you run for example dezombify via creating a
> new socket.

Right. The application thread has to finish the handshake with the I/O 
thread to be able to destroy the socket. Currently the handshake is 
finished using dezombify() function that is called on zmq_socket() or 
zmq_close(). Calling dezombify() from common functions like zmq_send() 
or zmq_recv() would mitigate the problem, however, the problem is that 
the dezombification is expensive and would slow down the critical path 
(actual message passing) considerably.

A possible solution proposed by Martin Lucina was to create a dedicated 
"reaper thread" that would asynchronously dezombify the closed sockets.

> You could verify this by creating a socket in an
> application, sending 1 message, receiving reply and then destruct it
> and wait for example 10 seconds and see if the TCP connection still
> exists. (lsof for example).
>
> 2) I didn't debug the 2nd part detailedly, but I suspect indeed that
> untill when the socket_t is destructed, the peer still sends messages
> to it - which will then all fail (no reply). That's at least the
> symptom I was seeing in my application. This part is the real nasty
> bug.

Yes. We can possibly close the socket immediately on first step of the 
finalisation handshake (either term command or when delimiter is 
received depending on ZMQ_LINGER setting) instead of waiting for the 
whole finalisation process to finish.

> A socket that stays around longer just takes up a TCP connection,
> but would not be a major concern (I wouldn't even have noticed it).
>
> Hope you will be able to tell where to look for the real solution :)
> ZeroMQ internally is really really really complicated! Partially
> because of all those abstractions, owner ship trees, messages etc.

Yes. It is. If you have any idea of how to simplify it, the suggestions 
are welcome.

> Several times I thought ZeroMQ internals were inspired by Erlang :-)

Actually, I wasn't aware of Erlang when I started working on 0MQ but the 
designs have converged. Probably, when you get serious about 
message-based concurrency you'll end up with something like this.

Martin



More information about the zeromq-dev mailing list