[zeromq-dev] Blocking issues with signaler_t::make_fdpair

Felipe Farinon felipe.farinon at powersyslab.com
Mon Dec 9 19:44:08 CET 2013


As Koby didn't answered, and I am not able to reproduce the problem 
anymore, could I make the modification even being unable to reproduce 
the problem (indirectly it will be tested, since I am going to run the 
modification in the same environment where the problem was happening)?

Em 01/12/2013 21:27, KIU Shueng Chuan escreveu:
>
> In master, you can switch to using ephemeral ports by modifying 
> signaler_port to 0 in config.hpp. A new ephemeral port is used per 
> make_fdpair call and no critical section is used.
>
> Could you try that and see if it solves your problems?
>
> On Dec 1, 2013 9:39 PM, "Koby Boyango" <koby.b at mce-sys.com 
> <mailto:koby.b at mce-sys.com>> wrote:
>
>     Hi
>     I'm fairly new to ZeroMQ, and have been working on integrating it
>     using czmq in several projects, Windows only.
>     I've opened an issue on GitHub*, *#767**, and to Pieter's request
>     I'm moving the discussion here. So here is what I've written there:
>     While trying to integrate ZeroMQ in different modules\processes
>     (Windows only), I've encountered a problem where in some
>     situations a ZeroMQ call blocks - forever. After debugging the
>     issue, I've found out that zmq_init wasn't returning, and after
>     further debugging and digging through the code I've found out that
>     the problem was in signaler_t::make_fdpair, where the
>     WaitForSingleObject on the "zmq-signaler-port-sync" didn't return.
>     Initially i wasn't sure in which situations it occurs. So I did
>     some further investigation and found out that in my case:
>
>       * For some reason, when I close a test program with Ctrl+C, the
>         event stays un-signaled. Not sure why yet, will need further
>         debugging.
>       * I had a node.js script, which uses ZeroMQ, running in the
>         background. Because it uses version 3.2.2 of libzmq, which
>         leaks the event handle, the existing event wasn't deleted, and
>         stayed in an un-signaled state.
>       * Basically, from that point no one on the system can use ZeroMQ.
>
>     I find make_fdpair to be very problematic on Windows:
>
>       * If one call exits without signaling the event, while someone
>         else is holding a handle to the event - All further calls on
>         the system will block. It can happen, for example, if an
>         assertion fails, and the process crashes because of the
>         exception raised.
>       * It can also happen if an assertion has failed, an exception
>         was raised, but caught by the caller using a __try & __except
>         block (SEH). We can't simply rely on the exception to crash
>         the process (for example, a program might wrap calls to its
>         plugins with __try & __except, so a faulty plugin won't crash
>         the while program).
>       * So it basically means that one faulty program can cause other,
>         unrelated programs, to block.
>
>     I suggest:
>
>       * No matter which synchronization mechanism is used, wrap the
>         code with __try & __finally, and release the lock in the
>         finally block. This will make sure that we'll release in case
>         of an exception (In my case, though, I tried it and it didn't
>         help. the thread might be terminated during the call).
>       * If possible, don't use a global, system wide, lock. From my
>         understanding, it is used in order to reuse the signaler port.
>         So either use a random, available, port, or make the port
>         "libzmq instance" specific (the first calls binds on a random
>         port, further calls will reuse the port) and protect it with
>         critical section. This will at least limit the problems to the
>         same process.
>       * If the system wide lock is really needed, I suggest using a
>         mutex instead of the event. When using a mutex, if the owning
>         thread dies without releasing it, Windows automatically
>         releases it and the next call to WaitForSingleObject will
>         return WAIT_ABANDONED, and do not block. We can than check if
>         the port was left in a "listening" state, close it if
>         necessary, and "re-listen" with a new socket.
>
>     I'm using libzmq 4.0.1 with czmq 2.0.2. I saw that the make_fdpair
>     was improved in the master, but I believe it still doesn't
>     entirely solve it.
>     What do you say?
>
>     Koby
>
>     _______________________________________________
>     zeromq-dev mailing list
>     zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>     http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20131209/a053e9bf/attachment.htm>


More information about the zeromq-dev mailing list