[zeromq-dev] Blocking issues with signaler_t::make_fdpair
Felipe Farinon
felipe.farinon at powersyslab.com
Tue Dec 10 20:18:25 CET 2013
Maybe it's time to switch to ephemeral ports again.
Em 10/12/2013 14:42, Koby Boyango escreveu:
> Sorry for my late reply, been sick for a few days. I've done some
> tests using the make_fdpair from the master, and it seems like using
> the ephemeral port support and avoiding the locking solved it. Thanks!
> But I do believe that if supporting a fixed signaler port is still
> desired, we should better protect against the scenarios I've described
> in my first mail. What do you think?
>
> Koby
>
>
> On Tue, Dec 10, 2013 at 12:37 AM, KIU Shueng Chuan <nixchuan at gmail.com
> <mailto:nixchuan at gmail.com>> wrote:
>
> I believe no permission is needed to do a pull request. :)
>
> Upon rereading Koby's mail more closely, his problem can be
> reproduced by having one background program use version 3.2.2. The
> leaked event handle ensures that the global event stays alive and
> doesn't get recreated each time by Windows.
>
> On Dec 10, 2013 2:44 AM, "Felipe Farinon"
> <felipe.farinon at powersyslab.com
> <mailto:felipe.farinon at powersyslab.com>> wrote:
>
> As Koby didn't answered, and I am not able to reproduce the
> problem anymore, could I make the modification even being
> unable to reproduce the problem (indirectly it will be tested,
> since I am going to run the modification in the same
> environment where the problem was happening)?
>
> Em 01/12/2013 21:27, KIU Shueng Chuan escreveu:
>>
>> In master, you can switch to using ephemeral ports by
>> modifying signaler_port to 0 in config.hpp. A new ephemeral
>> port is used per make_fdpair call and no critical section is
>> used.
>>
>> Could you try that and see if it solves your problems?
>>
>> On Dec 1, 2013 9:39 PM, "Koby Boyango" <koby.b at mce-sys.com
>> <mailto:koby.b at mce-sys.com>> wrote:
>>
>> Hi
>> I'm fairly new to ZeroMQ, and have been working on
>> integrating it using czmq in several projects, Windows only.
>> I've opened an issue on GitHub*, *#767**, and to Pieter's
>> request I'm moving the discussion here. So here is what
>> I've written there:
>> While trying to integrate ZeroMQ in different
>> modules\processes (Windows only), I've encountered a
>> problem where in some situations a ZeroMQ call blocks -
>> forever. After debugging the issue, I've found out that
>> zmq_init wasn't returning, and after further debugging
>> and digging through the code I've found out that the
>> problem was in signaler_t::make_fdpair, where the
>> WaitForSingleObject on the "zmq-signaler-port-sync"
>> didn't return.
>> Initially i wasn't sure in which situations it occurs. So
>> I did some further investigation and found out that in my
>> case:
>>
>> * For some reason, when I close a test program with
>> Ctrl+C, the event stays un-signaled. Not sure why
>> yet, will need further debugging.
>> * I had a node.js script, which uses ZeroMQ, running in
>> the background. Because it uses version 3.2.2 of
>> libzmq, which leaks the event handle, the existing
>> event wasn't deleted, and stayed in an un-signaled state.
>> * Basically, from that point no one on the system can
>> use ZeroMQ.
>>
>> I find make_fdpair to be very problematic on Windows:
>>
>> * If one call exits without signaling the event, while
>> someone else is holding a handle to the event - All
>> further calls on the system will block. It can
>> happen, for example, if an assertion fails, and the
>> process crashes because of the exception raised.
>> * It can also happen if an assertion has failed, an
>> exception was raised, but caught by the caller using
>> a __try & __except block (SEH). We can't simply rely
>> on the exception to crash the process (for example, a
>> program might wrap calls to its plugins with __try &
>> __except, so a faulty plugin won't crash the while
>> program).
>> * So it basically means that one faulty program can
>> cause other, unrelated programs, to block.
>>
>> I suggest:
>>
>> * No matter which synchronization mechanism is used,
>> wrap the code with __try & __finally, and release the
>> lock in the finally block. This will make sure that
>> we'll release in case of an exception (In my case,
>> though, I tried it and it didn't help. the thread
>> might be terminated during the call).
>> * If possible, don't use a global, system wide, lock.
>> From my understanding, it is used in order to reuse
>> the signaler port. So either use a random, available,
>> port, or make the port "libzmq instance" specific
>> (the first calls binds on a random port, further
>> calls will reuse the port) and protect it with
>> critical section. This will at least limit the
>> problems to the same process.
>> * If the system wide lock is really needed, I suggest
>> using a mutex instead of the event. When using a
>> mutex, if the owning thread dies without releasing
>> it, Windows automatically releases it and the next
>> call to WaitForSingleObject will return
>> WAIT_ABANDONED, and do not block. We can than check
>> if the port was left in a "listening" state, close it
>> if necessary, and "re-listen" with a new socket.
>>
>> I'm using libzmq 4.0.1 with czmq 2.0.2. I saw that the
>> make_fdpair was improved in the master, but I believe it
>> still doesn't entirely solve it.
>> What do you say?
>>
>> Koby
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> <mailto:zeromq-dev at lists.zeromq.org>
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20131210/e58daedc/attachment.htm>
More information about the zeromq-dev
mailing list