[zeromq-dev] Blocking issues with signaler_t::make_fdpair
Felipe Farinon
felipe.farinon at powersyslab.com
Mon Dec 9 19:44:08 CET 2013
As Koby didn't answered, and I am not able to reproduce the problem
anymore, could I make the modification even being unable to reproduce
the problem (indirectly it will be tested, since I am going to run the
modification in the same environment where the problem was happening)?
Em 01/12/2013 21:27, KIU Shueng Chuan escreveu:
>
> In master, you can switch to using ephemeral ports by modifying
> signaler_port to 0 in config.hpp. A new ephemeral port is used per
> make_fdpair call and no critical section is used.
>
> Could you try that and see if it solves your problems?
>
> On Dec 1, 2013 9:39 PM, "Koby Boyango" <koby.b at mce-sys.com
> <mailto:koby.b at mce-sys.com>> wrote:
>
> Hi
> I'm fairly new to ZeroMQ, and have been working on integrating it
> using czmq in several projects, Windows only.
> I've opened an issue on GitHub*, *#767**, and to Pieter's request
> I'm moving the discussion here. So here is what I've written there:
> While trying to integrate ZeroMQ in different modules\processes
> (Windows only), I've encountered a problem where in some
> situations a ZeroMQ call blocks - forever. After debugging the
> issue, I've found out that zmq_init wasn't returning, and after
> further debugging and digging through the code I've found out that
> the problem was in signaler_t::make_fdpair, where the
> WaitForSingleObject on the "zmq-signaler-port-sync" didn't return.
> Initially i wasn't sure in which situations it occurs. So I did
> some further investigation and found out that in my case:
>
> * For some reason, when I close a test program with Ctrl+C, the
> event stays un-signaled. Not sure why yet, will need further
> debugging.
> * I had a node.js script, which uses ZeroMQ, running in the
> background. Because it uses version 3.2.2 of libzmq, which
> leaks the event handle, the existing event wasn't deleted, and
> stayed in an un-signaled state.
> * Basically, from that point no one on the system can use ZeroMQ.
>
> I find make_fdpair to be very problematic on Windows:
>
> * If one call exits without signaling the event, while someone
> else is holding a handle to the event - All further calls on
> the system will block. It can happen, for example, if an
> assertion fails, and the process crashes because of the
> exception raised.
> * It can also happen if an assertion has failed, an exception
> was raised, but caught by the caller using a __try & __except
> block (SEH). We can't simply rely on the exception to crash
> the process (for example, a program might wrap calls to its
> plugins with __try & __except, so a faulty plugin won't crash
> the while program).
> * So it basically means that one faulty program can cause other,
> unrelated programs, to block.
>
> I suggest:
>
> * No matter which synchronization mechanism is used, wrap the
> code with __try & __finally, and release the lock in the
> finally block. This will make sure that we'll release in case
> of an exception (In my case, though, I tried it and it didn't
> help. the thread might be terminated during the call).
> * If possible, don't use a global, system wide, lock. From my
> understanding, it is used in order to reuse the signaler port.
> So either use a random, available, port, or make the port
> "libzmq instance" specific (the first calls binds on a random
> port, further calls will reuse the port) and protect it with
> critical section. This will at least limit the problems to the
> same process.
> * If the system wide lock is really needed, I suggest using a
> mutex instead of the event. When using a mutex, if the owning
> thread dies without releasing it, Windows automatically
> releases it and the next call to WaitForSingleObject will
> return WAIT_ABANDONED, and do not block. We can than check if
> the port was left in a "listening" state, close it if
> necessary, and "re-listen" with a new socket.
>
> I'm using libzmq 4.0.1 with czmq 2.0.2. I saw that the make_fdpair
> was improved in the master, but I believe it still doesn't
> entirely solve it.
> What do you say?
>
> Koby
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20131209/a053e9bf/attachment.htm>
More information about the zeromq-dev
mailing list