[zeromq-dev] Segfault

Martin Sustrik sustrik at 250bpm.com
Tue Jan 4 10:53:58 CET 2011


Hi Dhammika,

> It's actually in zmq.
> Terminated pipe writer gets a "activate" command. But load balancer
> doesn't decrement active pipe count in terminate() call.
>
>   55 void zmq::lb_t::terminate ()
>   56 {
>   57     zmq_assert (!terminating);
>   58     terminating = true;
>   59
>   60     sink->register_term_acks (pipes.size ());
>   61     for (pipes_t::size_type i = 0; i != pipes.size (); i++)
>   62         pipes [i]->terminate ();
>   63 }

Hm. Termination is an async process. So, at this point termination of 
all the pipes is started, but it will take some time to finish. As the 
time progresses, individual pipes will unregister themselves from the 
load-balancer.

Or have I misunderstood what you were saying? Not sure.

>>> Shutdown code is bit gnarly.
>>
>> More than a bit :) If anyone has any idea of how to make the shutdown code
>> less complex, let me know, please.
>>
> It's like TCP shutdown!
> Do we have a diagram of shutdown event flow?

Unfortunately not. And, unfortunately, it's a multi-faceted problem so I 
won't be able to describe it in short. However, here are some points:

1. Pipe shudown is a three-way handshake consisting of termination 
request command, termination command itself (which si actually a 
DELIMITER message passed via the pipe to ensure all the preceding 
messages are flushed) and termination ack.

2. Each socket is a root of tree of objects (sessions, connecters, 
listeners etc.) When any object is shutting dow it first shuts down't 
its children. The termination is done via three-way handshake similar to 
the above.

3. Any object can delay the termination (for example, session can delay 
termination of the socket so that all pending messages can be flushed to 
the network).

4. zmq_close() returns immediately and moves the socket to the zombie 
state. Zombie sockets are lazily cleaned later on in subsequent calls to 
libzmq.

5. Socket cannot be terminated before zmq_close() is called, even though 
zmq_term was called(). Instead socket returns ETERM and zmq_term() is 
blocked till the socket is zmq_close'd.

6. zmq_term() makes all open socket zombies and loops while all the 
zombies are cleaned up.

7. Inproc connections are special in that there's no I/O thread in the 
background. Thus, the shutdown mechanism should not block even though 
there's still work to do and there's no worker thread to do it. The work 
should be postponed instead.

If you have any specific questions, feel free to ask. The shutdown code 
is a mess...

Martin



More information about the zeromq-dev mailing list