[zeromq-dev] zmq-signaler-port-sync

Pau pau at teleopsia.com
Fri Feb 15 10:32:49 CET 2013


Hi,

Fine, I am ok with your patch, I will try to test it in faliling conditions.

You are right about CloseHandle(sync), we could add it, but if your 
patch works well I have no objections (my discusion was more esthetic 
than anything else).

thanks,

Pau

El 15/02/2013 10:13, KIU Shueng Chuan escribió:
> Hi Pau,
>
> I submitted a patch earlier this morning against the development branch.
> https://github.com/zeromq/libzmq/pull/514
> It only checks for error returns from connect() and accept(). The 
> other assertions that occur would be due to programming errors.
>
>
> Regarding your patch, I see that yours doesn't have the 
> CloseHandle(sync) as fixed in the trunk branch. Although it was to fix 
> a handle leak, not having this fix also has the side effect of making 
> the problem described in this thread much more likely to surface.
> - Process A and Process B both use zeromq and are running
> - Due to the handle leak, the global Event is kept alive even when we 
> are not inside the critical section.
> - Process A subsequently enters the critical section and asserts, 
> leaving it locked
> - Sometime later when Process B (or another newly started zeromq 
> application) tries to enter the critical section, it will block forever
> (With the fix, the problem surfaces only if another process was 
> blocking to enter the critical section at the point of assertion)
>
>
> On Fri, Feb 15, 2013 at 4:21 PM, Pau <pau at teleopsia.com 
> <mailto:pau at teleopsia.com>> wrote:
>
>     Kiu,
>
>     I have rethought about the problem and it is not very clear to me
>     how to run all make_fdpair() with no asserts. The problem is that
>     I do not know waht may happen if we call a wsa function after a
>     failure of a previous one. I guess that we could check in a per
>     call way and see if it is ok. Or define an internal boolean that
>     only calls the next function if the previous has suceeded.
>
>     I am not familiar with the coding style in ZMQ (never made a
>     patch) but I would say that the specific assert can look a bit
>     better. I am testing this code.
>     Anyway, if you think removing the asserts is ok, I am ok with it,
>     as said I am new in this yard.
>
>     Please check code:
>
>     It is not clear to me what to do with the win_assert(..) in case
>     SetEvent (sync) fails inside another assert. As it is now it is
>     clear that it will never be called but I guess that in case of
>     error you prefer to see the fisrt error. Not sure.
>
>     #if defined ZMQ_HAVE_WINDOWS
>     #define wsa_assert_fdpair(sync, no) \
>         {    \
>          bool brc = SetEvent (sync); \
>          wsa_assert (no); \
>          win_assert (brc != 0);    \
>         }
>
>     #define zmq_assert_fdpair(sync, x) \
>         {    \
>          bool brc = SetEvent (sync); \
>          zmq_assert (x); \
>          win_assert (brc != 0);    \
>         }
>
>     #define win_assert_fdpair(sync, x) \
>         {    \
>          bool brc = SetEvent (sync); \
>          win_assert ( (x) && (brc != 0) );    \
>         }
>
>     #endif
>
>     int zmq::signaler_t::make_fdpair (fd_t *r_, fd_t *w_)
>
>     ...
>
>         HANDLE sync = CreateEvent (&sa, FALSE, TRUE, TEXT
>     ("Global\\zmq-signaler-port-sync"));
>         if (sync == NULL && GetLastError () == ERROR_ACCESS_DENIED)
>           sync = OpenEvent (SYNCHRONIZE | EVENT_MODIFY_STATE, FALSE,
>     TEXT ("Global\\zmq-signaler-port-sync"));
>
>         win_assert (sync != NULL);
>
>         //  Enter the critical section.
>         DWORD dwrc = WaitForSingleObject (sync, INFINITE);
>         zmq_assert_fdpair (sync, dwrc == WAIT_OBJECT_0);
>
>         //  Windows has no 'socketpair' function. CreatePipe is no
>     good as pipe
>         //  handles cannot be polled on. Here we create the socketpair
>     by hand.
>         *w_ = INVALID_SOCKET;
>         *r_ = INVALID_SOCKET;
>
>         //  Create listening socket.
>         SOCKET listener;
>         listener = open_socket (AF_INET, SOCK_STREAM, 0);
>         wsa_assert_fdpair(sync, listener != INVALID_SOCKET);
>
>         //  Set SO_REUSEADDR and TCP_NODELAY on listening socket.
>         BOOL so_reuseaddr = 1;
>         int rc = setsockopt (listener, SOL_SOCKET, SO_REUSEADDR,
>             (char *)&so_reuseaddr, sizeof (so_reuseaddr));
>         wsa_assert (rc != SOCKET_ERROR);
>         BOOL tcp_nodelay = 1;
>         rc = setsockopt (listener, IPPROTO_TCP, TCP_NODELAY,
>             (char *)&tcp_nodelay, sizeof (tcp_nodelay));
>         wsa_assert_fdpair (sync, rc != SOCKET_ERROR);
>
>         //  Bind listening socket to any free local port.
>         struct sockaddr_in addr;
>         memset (&addr, 0, sizeof (addr));
>         addr.sin_family = AF_INET;
>         addr.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
>         addr.sin_port = htons (signaler_port);
>         rc = bind (listener, (const struct sockaddr*) &addr, sizeof
>     (addr));
>         wsa_assert_fdpair (sync, rc != SOCKET_ERROR);
>
>         //  Listen for incomming connections.
>         rc = listen (listener, 1);
>         wsa_assert_fdpair (sync, rc != SOCKET_ERROR);
>
>         //  Create the writer socket.
>         *w_ = WSASocket (AF_INET, SOCK_STREAM, 0, NULL, 0, 0);
>         wsa_assert_fdpair (sync, *w_ != INVALID_SOCKET);
>
>         //  On Windows, preventing sockets to be inherited by child
>     processes.
>         BOOL brc = SetHandleInformation ((HANDLE) *w_,
>     HANDLE_FLAG_INHERIT, 0);
>         win_assert_fdpair (sync, brc);
>
>         //  Set TCP_NODELAY on writer socket.
>         rc = setsockopt (*w_, IPPROTO_TCP, TCP_NODELAY,
>             (char *)&tcp_nodelay, sizeof (tcp_nodelay));
>         wsa_assert_fdpair (sync, rc != SOCKET_ERROR);
>
>         //  Connect writer to the listener.
>         rc = connect (*w_, (struct sockaddr*) &addr, sizeof (addr));
>         wsa_assert_fdpair (sync, rc != SOCKET_ERROR);
>
>         //  Accept connection from writer.
>         *r_ = accept (listener, NULL, NULL);
>         wsa_assert_fdpair (sync, *r_ != INVALID_SOCKET);
>
>         //  On Windows, preventing sockets to be inherited by child
>     processes.
>         brc = SetHandleInformation ((HANDLE) *r_, HANDLE_FLAG_INHERIT, 0);
>         win_assert_fdpair (sync, brc);
>
>         //  We don't need the listening socket anymore. Close it.
>         rc = closesocket (listener);
>         wsa_assert_fdpair (sync, rc != SOCKET_ERROR);
>
>         //  Exit the critical section.
>         brc = SetEvent (sync);
>         win_assert (brc != 0);
>
>         return 0;
>
>
>
>     El 14/02/2013 1:46, KIU Shueng Chuan escribió:
>>     Are we okay with using assertions to catch both
>>     - programming errors
>>     - rare but known situations not handled in the code? (in this
>>     case resource exhaustion)
>>
>>     How about this: Have make_fdpair() return -1 (and release the
>>     critical section) on error returns from the calls to connect()
>>     and accept() only.
>>     The failure will be caught by signaler_t() which calls make_fdpair()
>>
>>
>>
>>     On Wed, Feb 13, 2013 at 7:22 PM, Pau <pau at teleopsia.com
>>     <mailto:pau at teleopsia.com>> wrote:
>>
>>         Hi, thanks
>>
>>         I do not want to look bungler, but wouldn't be a shortcut to
>>         implement asserts that clean the event before aborting?
>>
>>         El 13/02/2013 9:54, KIU Shueng Chuan escribió:
>>>         Hi Pau,
>>>
>>>         The system-wide critical section is currently implemented
>>>         using a win32 Event which, as you observed, has the
>>>         possibility of resulting in a deadlock in the following
>>>         situation:
>>>         1) Process A takes the Event
>>>         2) Process B tries to take the Event and blocks
>>>         3) Process A aborts within the critical section (due to an
>>>         assertion being raised)
>>>         4) Since Process B has opened the Event, the OS will not
>>>         clean up the Event.
>>>         5) Process B and any subsequent process will now block
>>>         forever for the Event.
>>>
>>>         As I mentioned in the previous mail, if the critical section
>>>         were to be implemented using a Mutex instead, then in step
>>>         5, Process B would be able to enter the critical section
>>>         with a return code of WAIT_ABANDONED from
>>>         WaitForSingleObject. (Or at least that's what I read from MSDN)
>>>
>>>         Note: If Process A aborted due to some exhaustion of
>>>         resources, then Process B would likely hit the same
>>>         assertion too.
>>>
>>>         The question is how to convert the Event to a Mutex and yet
>>>         not break compatibility with existing applications using
>>>         older versions of the library.
>>>
>>>
>>>
>>>         On Wed, Feb 13, 2013 at 3:28 PM, Pau <pau at teleopsia.com
>>>         <mailto:pau at teleopsia.com>> wrote:
>>>
>>>             Hi,
>>>
>>>             I am back with the asserts happening inside a critical
>>>             section in signaler.cpp.
>>>             The problem still is that in signale.cpp make_fdpair(..)
>>>             creates system-wide critical section and does a number
>>>             of things that can generate a wsa_assert() or
>>>             win_assert() before releasing the session.
>>>
>>>             I have seen that in the trunk someone has added a
>>>             CloseHandle(sync) at the end of the function, I do not
>>>             know if it had something related with this but I
>>>             understand that the problem is still there. wsa_assert()
>>>             and wsa_windows() end up in RaiseException (0x40000015,
>>>             EXCEPTION_NONCONTINUABLE, 1, extra_info) which I
>>>             understand is a cul de sac that has no way out to clean
>>>             up before leaving.
>>>
>>>             I guess we need a special assert function to use inside
>>>             this critical but I'd like a more documented opinion (Kiu?).
>>>
>>>             thanks,
>>>
>>>             Pau Ceano
>>>
>>>             El 21/01/2013 23:37, KIU Shueng Chuan escribió:
>>>>
>>>>             Hi Pau, a fix for the assertion on connection to port
>>>>             5905 is in trunk branch.
>>>>
>>>>             I think the dangling critical section possibility could
>>>>             be fixed by changing the Event to a Mutex. When an
>>>>             assertion occurs the mutex would just be abandoned.
>>>>             However this change will cause backward compatibility
>>>>             issues with older versions.
>>>>
>>>>             On Jan 22, 2013 2:04 AM, "Pieter Hintjens"
>>>>             <ph at imatix.com <mailto:ph at imatix.com>> wrote:
>>>>
>>>>                 Hi Pau,
>>>>
>>>>                 So there are two different problems here, one is
>>>>                 that we're hitting a
>>>>                 socket limit on WXP, and the other is that the
>>>>                 asserts are happening
>>>>                 inside a critical section.
>>>>
>>>>                 I don't think we can fix the first one easily but
>>>>                 we can presumably
>>>>                 assert in a smarter way. Do you want to try making
>>>>                 a patch for this?
>>>>
>>>>                 -Pieter
>>>>
>>>>                 On Mon, Jan 21, 2013 at 6:23 PM, Pau
>>>>                 <pau at teleopsia.com <mailto:pau at teleopsia.com>> wrote:
>>>>                 >
>>>>                 > Hi,
>>>>                 >
>>>>                 >
>>>>                 > I am using (not yet in production) ZMQ on Windows
>>>>                 and I have found what
>>>>                 > I think is a big problem for Windows users.
>>>>                 > We use WXP and W7 and Visual C++ different
>>>>                 versions. ZMQ version 3.2.0
>>>>                 > (as far as I see the same problem happens in 3.2.2)
>>>>                 >
>>>>                 > I do not fully understand ZMQ internals but I've
>>>>                 seen that every time a
>>>>                 > socket is created the function make_fdpair(..) is
>>>>                 called and in
>>>>                 > signaler.cpp(line244) a system event
>>>>                 "zmq-signaler-port-sync" is created.
>>>>                 > This event is used as a system-wide critical
>>>>                 section and, so all
>>>>                 > applications that try to create an event will
>>>>                 WaitForSingleObject (sync,
>>>>                 > INFINITE) until  SetEvent (...) is called.
>>>>                 > The problem is that the code between:
>>>>                 >   HANDLE sync = CreateEvent (NULL, FALSE, TRUE, TEXT
>>>>                 > ("zmq-signaler-port-sync"));
>>>>                 > and
>>>>                 > SetEvent (sync);
>>>>                 > is full of wsa_asserts(..) that will terminate
>>>>                 the application if
>>>>                 > something goes wrong.
>>>>                 >
>>>>                 > It is clear that terminating the application not
>>>>                 leaving the system-wide
>>>>                 > critical section is a bad idea because all
>>>>                 applications in the system
>>>>                 > will hang and you will have to stop all them to
>>>>                 start again.
>>>>                 > I understand that no errors should happen but
>>>>                 anyway to escape from the
>>>>                 > error is not a good idea in this case.
>>>>                 >
>>>>                 > I do not know all possible reasons to generate a
>>>>                 fatal wsa_assert(..)
>>>>                 > but there is at least one:
>>>>                 >
>>>>                 > I have seen that in XP it is possible that line
>>>>                 301  rc = connect (*w_,
>>>>                 > (sockaddr *) &addr, sizeof (addr)); returns an
>>>>                 error when a socket tries
>>>>                 > to connect to 5905 and this has happened many times.
>>>>                 > Windows uses port numbers starting near 1400 and
>>>>                 XP has a limit at 5000.
>>>>                 > In W7 this does not look as a problem because
>>>>                 maximum is 65000
>>>>                 > It sounds as if the number was big enough but
>>>>                 apart from the fact that
>>>>                 > ZMQ uses a big number of connections (at least in
>>>>                 my tests) I have
>>>>                 > experienced that Windows jumps port numbers by 7,
>>>>                 so 5000 happens
>>>>                 > sometimes with catastrophic consequences.
>>>>                 >
>>>>                 > best,
>>>>                 >
>>>>                 > Pau Ceano
>>>>                 > _______________________________________________
>>>>                 > zeromq-dev mailing list
>>>>                 > zeromq-dev at lists.zeromq.org
>>>>                 <mailto:zeromq-dev at lists.zeromq.org>
>>>>                 > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>                 _______________________________________________
>>>>                 zeromq-dev mailing list
>>>>                 zeromq-dev at lists.zeromq.org
>>>>                 <mailto:zeromq-dev at lists.zeromq.org>
>>>>                 http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>
>>>>
>>>>
>>>>             _______________________________________________
>>>>             zeromq-dev mailing list
>>>>             zeromq-dev at lists.zeromq.org  <mailto:zeromq-dev at lists.zeromq.org>
>>>>             http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>>
>>>             _______________________________________________
>>>             zeromq-dev mailing list
>>>             zeromq-dev at lists.zeromq.org
>>>             <mailto:zeromq-dev at lists.zeromq.org>
>>>             http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>>
>>>
>>>
>>>         _______________________________________________
>>>         zeromq-dev mailing list
>>>         zeromq-dev at lists.zeromq.org  <mailto:zeromq-dev at lists.zeromq.org>
>>>         http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>         _______________________________________________
>>         zeromq-dev mailing list
>>         zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>>         http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>
>>
>>     _______________________________________________
>>     zeromq-dev mailing list
>>     zeromq-dev at lists.zeromq.org  <mailto:zeromq-dev at lists.zeromq.org>
>>     http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>     _______________________________________________
>     zeromq-dev mailing list
>     zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>     http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130215/d3faf72f/attachment.htm>


More information about the zeromq-dev mailing list