<div dir="ltr"><div style>Hi Pau,</div><div><br></div><div style>I submitted a patch earlier this morning against the development branch.</div><a href="https://github.com/zeromq/libzmq/pull/514">https://github.com/zeromq/libzmq/pull/514</a><br>

<div style>It only checks for error returns from connect() and accept(). The other assertions that occur would be due to programming errors.</div><div><br></div><div><br></div><div style>Regarding your patch, I see that yours doesn't have the CloseHandle(sync) as fixed in the trunk branch. Although it was to fix a handle leak, not having this fix also has the side effect of making the problem described in this thread much more likely to surface. </div>

<div style>- Process A and Process B both use zeromq and are running</div><div style>- Due to the handle leak, the global Event is kept alive even when we are not inside the critical section.</div><div style>- Process A subsequently enters the critical section and asserts, leaving it locked</div>

<div style>- Sometime later when Process B (or another newly started zeromq application) tries to enter the critical section, it will block forever</div><div style>(With the fix, the problem surfaces only if another process was blocking to enter the critical section at the point of assertion)</div>

</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Feb 15, 2013 at 4:21 PM, Pau <span dir="ltr"><<a href="mailto:pau@teleopsia.com" target="_blank">pau@teleopsia.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


  
    
  
  <div text="#000000" bgcolor="#FFFFFF">
    <div>Kiu,<br>
      <br>
      I have rethought about the problem and it is not very clear to me
      how to run all make_fdpair() with no asserts. The problem is that
      I do not know waht may happen if we call a wsa function after a
      failure of a previous one. I guess that we could check in a per
      call way and see if it is ok. Or define an internal boolean that
      only calls the next function if the previous has suceeded.<br>
      <br>
      I am not familiar with the coding style in ZMQ (never made a
      patch) but I would say that the specific assert can look a bit
      better. I am testing this code.<br>
      Anyway, if you think removing the asserts is ok, I am ok with it,
      as said I am new in this yard.<br>
      <br>
      Please check code:<br>
      <br>
      It is not clear to me what to do with the win_assert(..) in case
      SetEvent (sync) fails inside another assert. As it is now it is
      clear that it will never be called but I guess that in case of
      error you prefer to see the fisrt error. Not sure.<br>
      <br>
      #if defined ZMQ_HAVE_WINDOWS<br>
      #define wsa_assert_fdpair(sync, no) \<br>
          {    \<br>
           bool brc = SetEvent (sync); \<br>
           wsa_assert (no); \<br>
           win_assert (brc != 0);    \<br>
          }<br>
      <br>
      #define zmq_assert_fdpair(sync, x) \<br>
          {    \<br>
           bool brc = SetEvent (sync); \<br>
           zmq_assert (x); \<br>
           win_assert (brc != 0);    \<br>
          }<br>
      <br>
      #define win_assert_fdpair(sync, x) \<br>
          {    \<br>
           bool brc = SetEvent (sync); \<br>
           win_assert ( (x) && (brc != 0) );    \<br>
          }<br>
      <br>
      #endif<br>
      <br>
      int zmq::signaler_t::make_fdpair (fd_t *r_, fd_t *w_)<br>
      <br>
      ...<br>
      <br>
          HANDLE sync = CreateEvent (&sa, FALSE, TRUE, TEXT
      ("Global\\zmq-signaler-port-sync"));<br>
          if (sync == NULL && GetLastError () ==
      ERROR_ACCESS_DENIED)<br>
            sync = OpenEvent (SYNCHRONIZE | EVENT_MODIFY_STATE, FALSE,
      TEXT ("Global\\zmq-signaler-port-sync"));<br>
      <br>
          win_assert (sync != NULL);<br>
      <br>
          //  Enter the critical section.<br>
          DWORD dwrc = WaitForSingleObject (sync, INFINITE);<br>
          zmq_assert_fdpair (sync, dwrc == WAIT_OBJECT_0);<br>
      <br>
          //  Windows has no 'socketpair' function. CreatePipe is no
      good as pipe<br>
          //  handles cannot be polled on. Here we create the socketpair
      by hand.<br>
          *w_ = INVALID_SOCKET;<br>
          *r_ = INVALID_SOCKET;<br>
      <br>
          //  Create listening socket.<br>
          SOCKET listener;<br>
          listener = open_socket (AF_INET, SOCK_STREAM, 0);<br>
          wsa_assert_fdpair(sync, listener != INVALID_SOCKET);<br>
      <br>
          //  Set SO_REUSEADDR and TCP_NODELAY on listening socket.<br>
          BOOL so_reuseaddr = 1;<br>
          int rc = setsockopt (listener, SOL_SOCKET, SO_REUSEADDR,<br>
              (char *)&so_reuseaddr, sizeof (so_reuseaddr));<br>
          wsa_assert (rc != SOCKET_ERROR);<br>
          BOOL tcp_nodelay = 1;<br>
          rc = setsockopt (listener, IPPROTO_TCP, TCP_NODELAY,<br>
              (char *)&tcp_nodelay, sizeof (tcp_nodelay));<br>
          wsa_assert_fdpair (sync, rc != SOCKET_ERROR);<br>
      <br>
          //  Bind listening socket to any free local port.<br>
          struct sockaddr_in addr;<br>
          memset (&addr, 0, sizeof (addr));<br>
          addr.sin_family = AF_INET;<br>
          addr.sin_addr.s_addr = htonl (INADDR_LOOPBACK);<br>
          addr.sin_port = htons (signaler_port);<br>
          rc = bind (listener, (const struct sockaddr*) &addr,
      sizeof (addr));<br>
          wsa_assert_fdpair (sync, rc != SOCKET_ERROR);<br>
      <br>
          //  Listen for incomming connections.<br>
          rc = listen (listener, 1);<br>
          wsa_assert_fdpair (sync, rc != SOCKET_ERROR);<br>
      <br>
          //  Create the writer socket.<br>
          *w_ = WSASocket (AF_INET, SOCK_STREAM, 0, NULL, 0,  0);<br>
          wsa_assert_fdpair (sync, *w_ != INVALID_SOCKET);<br>
      <br>
          //  On Windows, preventing sockets to be inherited by child
      processes.<br>
          BOOL brc = SetHandleInformation ((HANDLE) *w_,
      HANDLE_FLAG_INHERIT, 0);<br>
          win_assert_fdpair (sync, brc);<br>
      <br>
          //  Set TCP_NODELAY on writer socket.<br>
          rc = setsockopt (*w_, IPPROTO_TCP, TCP_NODELAY,<br>
              (char *)&tcp_nodelay, sizeof (tcp_nodelay));<br>
          wsa_assert_fdpair (sync, rc != SOCKET_ERROR);<br>
      <br>
          //  Connect writer to the listener.<br>
          rc = connect (*w_, (struct sockaddr*) &addr, sizeof
      (addr));<br>
          wsa_assert_fdpair (sync, rc != SOCKET_ERROR);<br>
      <br>
          //  Accept connection from writer.<br>
          *r_ = accept (listener, NULL, NULL);<br>
          wsa_assert_fdpair (sync, *r_ != INVALID_SOCKET);<br>
      <br>
          //  On Windows, preventing sockets to be inherited by child
      processes.<br>
          brc = SetHandleInformation ((HANDLE) *r_, HANDLE_FLAG_INHERIT,
      0);<br>
          win_assert_fdpair (sync, brc);<br>
      <br>
          //  We don't need the listening socket anymore. Close it.<br>
          rc = closesocket (listener);<br>
          wsa_assert_fdpair (sync, rc != SOCKET_ERROR);<br>
      <br>
          //  Exit the critical section.<br>
          brc = SetEvent (sync);<br>
          win_assert (brc != 0);<br>
      <br>
          return 0;<div class="im"><br>
      <br>
      <br>
      El 14/02/2013 1:46, KIU Shueng Chuan escribió:<br>
    </div></div>
    <blockquote type="cite">
      <div dir="ltr">Are we okay with using assertions to catch both
        <div><div class="h5"><div>- programming errors </div>
        <div>- rare but known situations not handled in the
          code? (in this case resource exhaustion)</div>
        <div><br>
        </div>
        <div>
          How about this: Have make_fdpair() return -1 (and release the
          critical section) on error returns from the calls to connect()
          and accept() only.</div>
        <div>The failure will be caught by signaler_t() which
          calls make_fdpair()</div>
        <div><br>
        </div>
      </div></div></div><div><div class="h5">
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">On Wed, Feb 13, 2013 at 7:22 PM, Pau <span dir="ltr"><<a href="mailto:pau@teleopsia.com" target="_blank">pau@teleopsia.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF">
              <div>Hi, thanks<br>
                <br>
                I do not want to look bungler, but wouldn't be a
                shortcut to implement asserts that clean the event
                before aborting?<br>
                <br>
                El 13/02/2013 9:54, KIU Shueng Chuan escribió:<br>
              </div>
              <div>
                <div>
                  <blockquote type="cite">
                    <div dir="ltr">Hi Pau, 
                      <div><br>
                      </div>
                      <div>The system-wide critical section is currently
                        implemented using a win32 Event which, as you
                        observed, has the possibility of resulting in a
                        deadlock in the following situation:</div>
                      <div>1) Process A takes the Event</div>
                      <div>2) Process B tries to take the Event and
                        blocks</div>
                      <div>3) Process A aborts within the critical
                        section (due to an assertion being raised)</div>
                      <div>4) Since Process B has opened the Event, the
                        OS will not clean up the Event.</div>
                      <div>5) Process B and any subsequent process will
                        now block forever for the Event.</div>
                      <div><br>
                      </div>
                      <div>As I mentioned in the previous mail, if the
                        critical section were to be implemented using a
                        Mutex instead, then in step 5, Process B would
                        be able to enter the critical section with a
                        return code of WAIT_ABANDONED from
                        WaitForSingleObject. (Or at least that's what I
                        read from MSDN)<br>
                      </div>
                      <div><br>
                      </div>
                      <div>Note: If Process A aborted due to some
                        exhaustion of resources, then Process B would
                        likely hit the same assertion too.</div>
                      <div><br>
                      </div>
                      <div>The question is how to convert the Event to a
                        Mutex and yet not break compatibility with
                        existing applications using older versions of
                        the library.</div>
                      <div><br>
                      </div>
                    </div>
                    <div class="gmail_extra"><br>
                      <br>
                      <div class="gmail_quote">On Wed, Feb 13, 2013 at
                        3:28 PM, Pau <span dir="ltr"><<a href="mailto:pau@teleopsia.com" target="_blank">pau@teleopsia.com</a>></span>
                        wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                          <div text="#000000" bgcolor="#FFFFFF">
                            <div>Hi,<br>
                              <br>
                              I am back with the asserts happening
                              inside a critical section in signaler.cpp.<br>
                              The problem still is that in signale.cpp
                              make_fdpair(..) creates system-wide
                              critical section and does a number of
                              things that can generate a wsa_assert() or
                              win_assert() before releasing the session.<br>
                              <br>
                              I have seen that in the trunk someone has
                              added a CloseHandle(sync) at the end of
                              the function, I do not know if it had
                              something related with this but I
                              understand that the problem is still
                              there. wsa_assert() and wsa_windows() end
                              up in  RaiseException (0x40000015,
                              EXCEPTION_NONCONTINUABLE, 1, extra_info)
                              which I understand is a cul de sac that
                              has no way out to clean up before leaving.<br>
                              <br>
                              I guess we need a special assert function
                              to use inside this critical but I'd like a
                              more documented opinion (Kiu?).<br>
                              <br>
                              thanks,<br>
                              <br>
                              Pau Ceano<br>
                              <br>
                              El 21/01/2013 23:37, KIU Shueng Chuan
                              escribió:<br>
                            </div>
                            <div>
                              <div>
                                <blockquote type="cite">
                                  <p dir="ltr">Hi Pau, a fix for the
                                    assertion on connection to port 5905
                                    is in trunk branch.</p>
                                  <p dir="ltr">I think the dangling
                                    critical section possibility could
                                    be fixed by changing the Event to a
                                    Mutex. When an assertion occurs the
                                    mutex would just be abandoned.
                                    However this change will cause
                                    backward compatibility issues with
                                    older versions.<br>
                                  </p>
                                  <div class="gmail_quote">On Jan 22,
                                    2013 2:04 AM, "Pieter Hintjens" <<a href="mailto:ph@imatix.com" target="_blank">ph@imatix.com</a>>

                                    wrote:<br type="attribution">
                                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Hi Pau,<br>
                                      <br>
                                      So there are two different
                                      problems here, one is that we're
                                      hitting a<br>
                                      socket limit on WXP, and the other
                                      is that the asserts are happening<br>
                                      inside a critical section.<br>
                                      <br>
                                      I don't think we can fix the first
                                      one easily but we can presumably<br>
                                      assert in a smarter way. Do you
                                      want to try making a patch for
                                      this?<br>
                                      <br>
                                      -Pieter<br>
                                      <br>
                                      On Mon, Jan 21, 2013 at 6:23 PM,
                                      Pau <<a href="mailto:pau@teleopsia.com" target="_blank">pau@teleopsia.com</a>>

                                      wrote:<br>
                                      ><br>
                                      > Hi,<br>
                                      ><br>
                                      ><br>
                                      > I am using (not yet in
                                      production) ZMQ on Windows and I
                                      have found what<br>
                                      > I think is a big problem for
                                      Windows users.<br>
                                      > We use WXP and W7 and Visual
                                      C++ different versions. ZMQ
                                      version 3.2.0<br>
                                      > (as far as I see the same
                                      problem happens in 3.2.2)<br>
                                      ><br>
                                      > I do not fully understand ZMQ
                                      internals but I've seen that every
                                      time a<br>
                                      > socket is created the
                                      function make_fdpair(..) is called
                                      and in<br>
                                      > signaler.cpp(line244) a
                                      system event
                                      "zmq-signaler-port-sync" is
                                      created.<br>
                                      > This event is used as a
                                      system-wide critical section and,
                                      so all<br>
                                      > applications that try to
                                      create an event will
                                      WaitForSingleObject (sync,<br>
                                      > INFINITE) until  SetEvent
                                      (...) is called.<br>
                                      > The problem is that the code
                                      between:<br>
                                      >   HANDLE sync = CreateEvent
                                      (NULL, FALSE, TRUE, TEXT<br>
                                      > ("zmq-signaler-port-sync"));<br>
                                      > and<br>
                                      > SetEvent (sync);<br>
                                      > is full of wsa_asserts(..)
                                      that will terminate the
                                      application if<br>
                                      > something goes wrong.<br>
                                      ><br>
                                      > It is clear that terminating
                                      the application not leaving the
                                      system-wide<br>
                                      > critical section is a bad
                                      idea because all applications in
                                      the system<br>
                                      > will hang and you will have
                                      to stop all them to start again.<br>
                                      > I understand that no errors
                                      should happen but anyway to escape
                                      from the<br>
                                      > error is not a good idea in
                                      this case.<br>
                                      ><br>
                                      > I do not know all possible
                                      reasons to generate a fatal
                                      wsa_assert(..)<br>
                                      > but there is at least one:<br>
                                      ><br>
                                      > I have seen that in XP it is
                                      possible that line 301  rc =
                                      connect (*w_,<br>
                                      > (sockaddr *) &addr,
                                      sizeof (addr)); returns an error
                                      when a socket tries<br>
                                      > to connect to 5905 and this
                                      has happened many times.<br>
                                      > Windows uses port numbers
                                      starting near 1400 and XP has a
                                      limit at 5000.<br>
                                      > In W7 this does not look as a
                                      problem because maximum is 65000<br>
                                      > It sounds as if the number
                                      was big enough but apart from the
                                      fact that<br>
                                      > ZMQ uses a big number of
                                      connections (at least in my tests)
                                      I have<br>
                                      > experienced that Windows
                                      jumps port numbers by 7, so 5000
                                      happens<br>
                                      > sometimes with catastrophic
                                      consequences.<br>
                                      ><br>
                                      > best,<br>
                                      ><br>
                                      > Pau Ceano<br>
                                      >
                                      _______________________________________________<br>
                                      > zeromq-dev mailing list<br>
                                      > <a href="mailto:zeromq-dev@lists.zeromq.org" target="_blank">zeromq-dev@lists.zeromq.org</a><br>
                                      > <a href="http://lists.zeromq.org/mailman/listinfo/zeromq-dev" target="_blank">http://lists.zeromq.org/mailman/listinfo/zeromq-dev</a><br>
_______________________________________________<br>
                                      zeromq-dev mailing list<br>
                                      <a href="mailto:zeromq-dev@lists.zeromq.org" target="_blank">zeromq-dev@lists.zeromq.org</a><br>
                                      <a href="http://lists.zeromq.org/mailman/listinfo/zeromq-dev" target="_blank">http://lists.zeromq.org/mailman/listinfo/zeromq-dev</a><br>
                                    </blockquote>
                                  </div>
                                  <br>
                                  <fieldset></fieldset>
                                  <br>
                                  <pre>_______________________________________________
zeromq-dev mailing list
<a href="mailto:zeromq-dev@lists.zeromq.org" target="_blank">zeromq-dev@lists.zeromq.org</a>
<a href="http://lists.zeromq.org/mailman/listinfo/zeromq-dev" target="_blank">http://lists.zeromq.org/mailman/listinfo/zeromq-dev</a>
</pre>
                                </blockquote>
                                <br>
                              </div>
                            </div>
                          </div>
                          <br>
_______________________________________________<br>
                          zeromq-dev mailing list<br>
                          <a href="mailto:zeromq-dev@lists.zeromq.org" target="_blank">zeromq-dev@lists.zeromq.org</a><br>
                          <a href="http://lists.zeromq.org/mailman/listinfo/zeromq-dev" target="_blank">http://lists.zeromq.org/mailman/listinfo/zeromq-dev</a><br>
                          <br>
                        </blockquote>
                      </div>
                      <br>
                    </div>
                    <br>
                    <fieldset></fieldset>
                    <br>
                    <pre>_______________________________________________
zeromq-dev mailing list
<a href="mailto:zeromq-dev@lists.zeromq.org" target="_blank">zeromq-dev@lists.zeromq.org</a>
<a href="http://lists.zeromq.org/mailman/listinfo/zeromq-dev" target="_blank">http://lists.zeromq.org/mailman/listinfo/zeromq-dev</a>
</pre>
                  </blockquote>
                  <br>
                </div>
              </div>
            </div>
            <br>
            _______________________________________________<br>
            zeromq-dev mailing list<br>
            <a href="mailto:zeromq-dev@lists.zeromq.org" target="_blank">zeromq-dev@lists.zeromq.org</a><br>
            <a href="http://lists.zeromq.org/mailman/listinfo/zeromq-dev" target="_blank">http://lists.zeromq.org/mailman/listinfo/zeromq-dev</a><br>
            <br>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset></fieldset>
      <br>
      <pre>_______________________________________________
zeromq-dev mailing list
<a href="mailto:zeromq-dev@lists.zeromq.org" target="_blank">zeromq-dev@lists.zeromq.org</a>
<a href="http://lists.zeromq.org/mailman/listinfo/zeromq-dev" target="_blank">http://lists.zeromq.org/mailman/listinfo/zeromq-dev</a>
</pre>
    </div></div></blockquote>
    <br>
  </div>

<br>_______________________________________________<br>
zeromq-dev mailing list<br>
<a href="mailto:zeromq-dev@lists.zeromq.org">zeromq-dev@lists.zeromq.org</a><br>
<a href="http://lists.zeromq.org/mailman/listinfo/zeromq-dev" target="_blank">http://lists.zeromq.org/mailman/listinfo/zeromq-dev</a><br>
<br></blockquote></div><br></div>