[zeromq-dev] duped sockets and fork
Matt Connolly
matt.connolly at me.com
Tue Sep 17 01:15:09 CEST 2013
There's two types of sockets used by zeromq as far as I understand: external connections and internal pipes used to communicate between the io threads and the host application.
My patch for zmq_term closes all of the internal pipes with new ones. This allows the termination process to complete without affecting the pipes that were inherited from the parent process, which caused asserts in the parent.
Returning EINTR was intended so that terminating the context would behave the same as if the process received a signal. (It could be receiving signals for other reasons, eg usr signal)
If there are connected zmq sockets (to some other machine for example) then those sockets would also be inherited but I thought they would have been closed correctly by the termination process. This may not be working right and activity on these sockets between fork and terminate in the child may interfere with the parent context's ability to use these sockets. Perhaps these sockets are not actually being closed properly and causing this problem.
I'll take a closer look later in the week and see...
Regards,
Matt.
> On 17 Sep 2013, at 8:22 am, Selim Ciraci <ciraci at gmail.com> wrote:
>
> Hi Matt,
>
> Another things is, sorry if I'm wrong, but zmq_term in the child always returns EINTR. This is because most of the sockets operations return EINTR when pid!= getpid(). With your patch signaler will create a new eventfd (correct me if I'm wrong) and then return. It is up to the reaper thread to close the sockets right? but since most operations just return EINTR, I wonder if the sockets are really closed after the fork.
>
> Best,
> Selim Ciraci
>
>
>> On Mon, Sep 16, 2013 at 11:40 AM, Selim Ciraci <ciraci at gmail.com> wrote:
>> Hi Matt,
>>
>> It is not an assertion fail. The problem occurs in connections between router-dealer sockets. The send function in router.cpp returns no route to host because it cannot find the host_id in the outpipes_t. A careful debug shows that actually the pipe from dealer to the router has not been established. I put a printf to xidentify_peer method in router.cpp, the new client ids are inserted to the outpipes_t in this method as far as I know. The aim here is compare the child process ids with the ids the router socket received. The comparison actually showed that some child ids went missing (router socket never received them). I must add that the ids went missing after a parent process terminates. Though I need further testing to prove this.
>>
>> Any ideas what might be going wrong here? I'm going to try to implement a simple test case.
>>
>> Thanks,
>> Selim
>>
>>
>>> On Mon, Sep 16, 2013 at 6:13 AM, Matt Connolly <matt.connolly at me.com> wrote:
>>> Hi Selim,
>>>
>>> I don’t have any ideas yet about why the parent would stop sending messages after forking a second child.
>>>
>>> Is it possible to reproduce this in a simple test case?
>>>
>>> And when the no route to host error occurs, is that an assertion? If so, can you provide a stack trace?
>>>
>>> -Matt
>>>
>>> On 14 Sep 2013, at 6:43 am, Selim Ciraci <ciraci at gmail.com> wrote:
>>>
>>> > Hi Matt,
>>> >
>>> > Thanks for your reply. I have actually found out about your patch after the email. I have updated zmq to head from github and tried with my program. The parent sockets seems to have closed. But the problem is every now and then I get "no route to host" errors in zmq_send. This happens usually when:
>>> > parent forks a child, child calls zmq_term(parent_context) does work and then terimantes (closes its context).
>>> > parent in parallel uses parent_context, does work, learns the child has terminated, forks a new child child2.
>>> > child2 zmq_term(parent_context) does work and then terimantes (closes its context).
>>> > after child2 terminates parent cannot receive messages. Even though the parent is active, zmq_send in the server fails with no route to host.
>>> >
>>> > I have no idea why this fails. Any ideas what might be causing this?
>>> >
>>> > Best,
>>> > Selim Ciraci
>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130917/ec03ceff/attachment.htm>
More information about the zeromq-dev
mailing list