[zeromq-dev] zmq-signaler-port-sync

Pau pau at teleopsia.com
Fri Feb 15 09:21:00 CET 2013


Kiu,

I have rethought about the problem and it is not very clear to me how to 
run all make_fdpair() with no asserts. The problem is that I do not know 
waht may happen if we call a wsa function after a failure of a previous 
one. I guess that we could check in a per call way and see if it is ok. 
Or define an internal boolean that only calls the next function if the 
previous has suceeded.

I am not familiar with the coding style in ZMQ (never made a patch) but 
I would say that the specific assert can look a bit better. I am testing 
this code.
Anyway, if you think removing the asserts is ok, I am ok with it, as 
said I am new in this yard.

Please check code:

It is not clear to me what to do with the win_assert(..) in case 
SetEvent (sync) fails inside another assert. As it is now it is clear 
that it will never be called but I guess that in case of error you 
prefer to see the fisrt error. Not sure.

#if defined ZMQ_HAVE_WINDOWS
#define wsa_assert_fdpair(sync, no) \
     {    \
      bool brc = SetEvent (sync); \
      wsa_assert (no); \
      win_assert (brc != 0);    \
     }

#define zmq_assert_fdpair(sync, x) \
     {    \
      bool brc = SetEvent (sync); \
      zmq_assert (x); \
      win_assert (brc != 0);    \
     }

#define win_assert_fdpair(sync, x) \
     {    \
      bool brc = SetEvent (sync); \
      win_assert ( (x) && (brc != 0) );    \
     }

#endif

int zmq::signaler_t::make_fdpair (fd_t *r_, fd_t *w_)

...

     HANDLE sync = CreateEvent (&sa, FALSE, TRUE, TEXT 
("Global\\zmq-signaler-port-sync"));
     if (sync == NULL && GetLastError () == ERROR_ACCESS_DENIED)
       sync = OpenEvent (SYNCHRONIZE | EVENT_MODIFY_STATE, FALSE, TEXT 
("Global\\zmq-signaler-port-sync"));

     win_assert (sync != NULL);

     //  Enter the critical section.
     DWORD dwrc = WaitForSingleObject (sync, INFINITE);
     zmq_assert_fdpair (sync, dwrc == WAIT_OBJECT_0);

     //  Windows has no 'socketpair' function. CreatePipe is no good as pipe
     //  handles cannot be polled on. Here we create the socketpair by hand.
     *w_ = INVALID_SOCKET;
     *r_ = INVALID_SOCKET;

     //  Create listening socket.
     SOCKET listener;
     listener = open_socket (AF_INET, SOCK_STREAM, 0);
     wsa_assert_fdpair(sync, listener != INVALID_SOCKET);

     //  Set SO_REUSEADDR and TCP_NODELAY on listening socket.
     BOOL so_reuseaddr = 1;
     int rc = setsockopt (listener, SOL_SOCKET, SO_REUSEADDR,
         (char *)&so_reuseaddr, sizeof (so_reuseaddr));
     wsa_assert (rc != SOCKET_ERROR);
     BOOL tcp_nodelay = 1;
     rc = setsockopt (listener, IPPROTO_TCP, TCP_NODELAY,
         (char *)&tcp_nodelay, sizeof (tcp_nodelay));
     wsa_assert_fdpair (sync, rc != SOCKET_ERROR);

     //  Bind listening socket to any free local port.
     struct sockaddr_in addr;
     memset (&addr, 0, sizeof (addr));
     addr.sin_family = AF_INET;
     addr.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
     addr.sin_port = htons (signaler_port);
     rc = bind (listener, (const struct sockaddr*) &addr, sizeof (addr));
     wsa_assert_fdpair (sync, rc != SOCKET_ERROR);

     //  Listen for incomming connections.
     rc = listen (listener, 1);
     wsa_assert_fdpair (sync, rc != SOCKET_ERROR);

     //  Create the writer socket.
     *w_ = WSASocket (AF_INET, SOCK_STREAM, 0, NULL, 0,  0);
     wsa_assert_fdpair (sync, *w_ != INVALID_SOCKET);

     //  On Windows, preventing sockets to be inherited by child processes.
     BOOL brc = SetHandleInformation ((HANDLE) *w_, HANDLE_FLAG_INHERIT, 0);
     win_assert_fdpair (sync, brc);

     //  Set TCP_NODELAY on writer socket.
     rc = setsockopt (*w_, IPPROTO_TCP, TCP_NODELAY,
         (char *)&tcp_nodelay, sizeof (tcp_nodelay));
     wsa_assert_fdpair (sync, rc != SOCKET_ERROR);

     //  Connect writer to the listener.
     rc = connect (*w_, (struct sockaddr*) &addr, sizeof (addr));
     wsa_assert_fdpair (sync, rc != SOCKET_ERROR);

     //  Accept connection from writer.
     *r_ = accept (listener, NULL, NULL);
     wsa_assert_fdpair (sync, *r_ != INVALID_SOCKET);

     //  On Windows, preventing sockets to be inherited by child processes.
     brc = SetHandleInformation ((HANDLE) *r_, HANDLE_FLAG_INHERIT, 0);
     win_assert_fdpair (sync, brc);

     //  We don't need the listening socket anymore. Close it.
     rc = closesocket (listener);
     wsa_assert_fdpair (sync, rc != SOCKET_ERROR);

     //  Exit the critical section.
     brc = SetEvent (sync);
     win_assert (brc != 0);

     return 0;


El 14/02/2013 1:46, KIU Shueng Chuan escribió:
> Are we okay with using assertions to catch both
> - programming errors
> - rare but known situations not handled in the code? (in this case 
> resource exhaustion)
>
> How about this: Have make_fdpair() return -1 (and release the critical 
> section) on error returns from the calls to connect() and accept() only.
> The failure will be caught by signaler_t() which calls make_fdpair()
>
>
>
> On Wed, Feb 13, 2013 at 7:22 PM, Pau <pau at teleopsia.com 
> <mailto:pau at teleopsia.com>> wrote:
>
>     Hi, thanks
>
>     I do not want to look bungler, but wouldn't be a shortcut to
>     implement asserts that clean the event before aborting?
>
>     El 13/02/2013 9:54, KIU Shueng Chuan escribió:
>>     Hi Pau,
>>
>>     The system-wide critical section is currently implemented using a
>>     win32 Event which, as you observed, has the possibility of
>>     resulting in a deadlock in the following situation:
>>     1) Process A takes the Event
>>     2) Process B tries to take the Event and blocks
>>     3) Process A aborts within the critical section (due to an
>>     assertion being raised)
>>     4) Since Process B has opened the Event, the OS will not clean up
>>     the Event.
>>     5) Process B and any subsequent process will now block forever
>>     for the Event.
>>
>>     As I mentioned in the previous mail, if the critical section were
>>     to be implemented using a Mutex instead, then in step 5, Process
>>     B would be able to enter the critical section with a return code
>>     of WAIT_ABANDONED from WaitForSingleObject. (Or at least that's
>>     what I read from MSDN)
>>
>>     Note: If Process A aborted due to some exhaustion of resources,
>>     then Process B would likely hit the same assertion too.
>>
>>     The question is how to convert the Event to a Mutex and yet not
>>     break compatibility with existing applications using older
>>     versions of the library.
>>
>>
>>
>>     On Wed, Feb 13, 2013 at 3:28 PM, Pau <pau at teleopsia.com
>>     <mailto:pau at teleopsia.com>> wrote:
>>
>>         Hi,
>>
>>         I am back with the asserts happening inside a critical
>>         section in signaler.cpp.
>>         The problem still is that in signale.cpp make_fdpair(..)
>>         creates system-wide critical section and does a number of
>>         things that can generate a wsa_assert() or win_assert()
>>         before releasing the session.
>>
>>         I have seen that in the trunk someone has added a
>>         CloseHandle(sync) at the end of the function, I do not know
>>         if it had something related with this but I understand that
>>         the problem is still there. wsa_assert() and wsa_windows()
>>         end up in  RaiseException (0x40000015,
>>         EXCEPTION_NONCONTINUABLE, 1, extra_info) which I understand
>>         is a cul de sac that has no way out to clean up before leaving.
>>
>>         I guess we need a special assert function to use inside this
>>         critical but I'd like a more documented opinion (Kiu?).
>>
>>         thanks,
>>
>>         Pau Ceano
>>
>>         El 21/01/2013 23:37, KIU Shueng Chuan escribió:
>>>
>>>         Hi Pau, a fix for the assertion on connection to port 5905
>>>         is in trunk branch.
>>>
>>>         I think the dangling critical section possibility could be
>>>         fixed by changing the Event to a Mutex. When an assertion
>>>         occurs the mutex would just be abandoned. However this
>>>         change will cause backward compatibility issues with older
>>>         versions.
>>>
>>>         On Jan 22, 2013 2:04 AM, "Pieter Hintjens" <ph at imatix.com
>>>         <mailto:ph at imatix.com>> wrote:
>>>
>>>             Hi Pau,
>>>
>>>             So there are two different problems here, one is that
>>>             we're hitting a
>>>             socket limit on WXP, and the other is that the asserts
>>>             are happening
>>>             inside a critical section.
>>>
>>>             I don't think we can fix the first one easily but we can
>>>             presumably
>>>             assert in a smarter way. Do you want to try making a
>>>             patch for this?
>>>
>>>             -Pieter
>>>
>>>             On Mon, Jan 21, 2013 at 6:23 PM, Pau <pau at teleopsia.com
>>>             <mailto:pau at teleopsia.com>> wrote:
>>>             >
>>>             > Hi,
>>>             >
>>>             >
>>>             > I am using (not yet in production) ZMQ on Windows and
>>>             I have found what
>>>             > I think is a big problem for Windows users.
>>>             > We use WXP and W7 and Visual C++ different versions.
>>>             ZMQ version 3.2.0
>>>             > (as far as I see the same problem happens in 3.2.2)
>>>             >
>>>             > I do not fully understand ZMQ internals but I've seen
>>>             that every time a
>>>             > socket is created the function make_fdpair(..) is
>>>             called and in
>>>             > signaler.cpp(line244) a system event
>>>             "zmq-signaler-port-sync" is created.
>>>             > This event is used as a system-wide critical section
>>>             and, so all
>>>             > applications that try to create an event will
>>>             WaitForSingleObject (sync,
>>>             > INFINITE) until  SetEvent (...) is called.
>>>             > The problem is that the code between:
>>>             >   HANDLE sync = CreateEvent (NULL, FALSE, TRUE, TEXT
>>>             > ("zmq-signaler-port-sync"));
>>>             > and
>>>             > SetEvent (sync);
>>>             > is full of wsa_asserts(..) that will terminate the
>>>             application if
>>>             > something goes wrong.
>>>             >
>>>             > It is clear that terminating the application not
>>>             leaving the system-wide
>>>             > critical section is a bad idea because all
>>>             applications in the system
>>>             > will hang and you will have to stop all them to start
>>>             again.
>>>             > I understand that no errors should happen but anyway
>>>             to escape from the
>>>             > error is not a good idea in this case.
>>>             >
>>>             > I do not know all possible reasons to generate a fatal
>>>             wsa_assert(..)
>>>             > but there is at least one:
>>>             >
>>>             > I have seen that in XP it is possible that line 301
>>>              rc = connect (*w_,
>>>             > (sockaddr *) &addr, sizeof (addr)); returns an error
>>>             when a socket tries
>>>             > to connect to 5905 and this has happened many times.
>>>             > Windows uses port numbers starting near 1400 and XP
>>>             has a limit at 5000.
>>>             > In W7 this does not look as a problem because maximum
>>>             is 65000
>>>             > It sounds as if the number was big enough but apart
>>>             from the fact that
>>>             > ZMQ uses a big number of connections (at least in my
>>>             tests) I have
>>>             > experienced that Windows jumps port numbers by 7, so
>>>             5000 happens
>>>             > sometimes with catastrophic consequences.
>>>             >
>>>             > best,
>>>             >
>>>             > Pau Ceano
>>>             > _______________________________________________
>>>             > zeromq-dev mailing list
>>>             > zeromq-dev at lists.zeromq.org
>>>             <mailto:zeromq-dev at lists.zeromq.org>
>>>             > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>             _______________________________________________
>>>             zeromq-dev mailing list
>>>             zeromq-dev at lists.zeromq.org
>>>             <mailto:zeromq-dev at lists.zeromq.org>
>>>             http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>>
>>>
>>>         _______________________________________________
>>>         zeromq-dev mailing list
>>>         zeromq-dev at lists.zeromq.org  <mailto:zeromq-dev at lists.zeromq.org>
>>>         http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>         _______________________________________________
>>         zeromq-dev mailing list
>>         zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>>         http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>
>>
>>     _______________________________________________
>>     zeromq-dev mailing list
>>     zeromq-dev at lists.zeromq.org  <mailto:zeromq-dev at lists.zeromq.org>
>>     http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>     _______________________________________________
>     zeromq-dev mailing list
>     zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>     http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130215/aaed4b50/attachment.htm>


More information about the zeromq-dev mailing list