[zeromq-dev] Blocking issues with signaler_t::make_fdpair
KIU Shueng Chuan
nixchuan at gmail.com
Mon Dec 9 23:37:33 CET 2013
I believe no permission is needed to do a pull request. :)
Upon rereading Koby's mail more closely, his problem can be reproduced by
having one background program use version 3.2.2. The leaked event handle
ensures that the global event stays alive and doesn't get recreated each
time by Windows.
On Dec 10, 2013 2:44 AM, "Felipe Farinon" <felipe.farinon at powersyslab.com>
wrote:
> As Koby didn't answered, and I am not able to reproduce the problem
> anymore, could I make the modification even being unable to reproduce the
> problem (indirectly it will be tested, since I am going to run the
> modification in the same environment where the problem was happening)?
>
> Em 01/12/2013 21:27, KIU Shueng Chuan escreveu:
>
> In master, you can switch to using ephemeral ports by modifying
> signaler_port to 0 in config.hpp. A new ephemeral port is used per
> make_fdpair call and no critical section is used.
>
> Could you try that and see if it solves your problems?
> On Dec 1, 2013 9:39 PM, "Koby Boyango" <koby.b at mce-sys.com> wrote:
>
>> Hi
>> I'm fairly new to ZeroMQ, and have been working on integrating it using
>> czmq in several projects, Windows only.
>> I've opened an issue on GitHub*, *#767, and to Pieter's request I'm
>> moving the discussion here. So here is what I've written there:
>> While trying to integrate ZeroMQ in different modules\processes (Windows
>> only), I've encountered a problem where in some situations a ZeroMQ call
>> blocks - forever. After debugging the issue, I've found out that zmq_init
>> wasn't returning, and after further debugging and digging through the code
>> I've found out that the problem was in signaler_t::make_fdpair, where the
>> WaitForSingleObject on the "zmq-signaler-port-sync" didn't return.
>> Initially i wasn't sure in which situations it occurs. So I did some
>> further investigation and found out that in my case:
>>
>> - For some reason, when I close a test program with Ctrl+C, the event
>> stays un-signaled. Not sure why yet, will need further debugging.
>> - I had a node.js script, which uses ZeroMQ, running in the
>> background. Because it uses version 3.2.2 of libzmq, which leaks the event
>> handle, the existing event wasn't deleted, and stayed in an un-signaled
>> state.
>> - Basically, from that point no one on the system can use ZeroMQ.
>>
>> I find make_fdpair to be very problematic on Windows:
>>
>> - If one call exits without signaling the event, while someone else
>> is holding a handle to the event - All further calls on the system will
>> block. It can happen, for example, if an assertion fails, and the process
>> crashes because of the exception raised.
>> - It can also happen if an assertion has failed, an exception was
>> raised, but caught by the caller using a __try & __except block (SEH). We
>> can't simply rely on the exception to crash the process (for example, a
>> program might wrap calls to its plugins with __try & __except, so a faulty
>> plugin won't crash the while program).
>> - So it basically means that one faulty program can cause other,
>> unrelated programs, to block.
>>
>> I suggest:
>>
>> - No matter which synchronization mechanism is used, wrap the code
>> with __try & __finally, and release the lock in the finally block. This
>> will make sure that we'll release in case of an exception (In my case,
>> though, I tried it and it didn't help. the thread might be terminated
>> during the call).
>> - If possible, don't use a global, system wide, lock. From my
>> understanding, it is used in order to reuse the signaler port. So either
>> use a random, available, port, or make the port "libzmq instance" specific
>> (the first calls binds on a random port, further calls will reuse the port)
>> and protect it with critical section. This will at least limit the problems
>> to the same process.
>> - If the system wide lock is really needed, I suggest using a mutex
>> instead of the event. When using a mutex, if the owning thread dies without
>> releasing it, Windows automatically releases it and the next call to
>> WaitForSingleObject will return WAIT_ABANDONED, and do not block. We can
>> than check if the port was left in a "listening" state, close it if
>> necessary, and "re-listen" with a new socket.
>>
>> I'm using libzmq 4.0.1 with czmq 2.0.2. I saw that the make_fdpair was
>> improved in the master, but I believe it still doesn't entirely solve it.
>> What do you say?
>>
>> Koby
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>
> _______________________________________________
> zeromq-dev mailing listzeromq-dev at lists.zeromq.orghttp://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20131210/a0f1e958/attachment.htm>
More information about the zeromq-dev
mailing list