[zeromq-dev] zmq-signaler-port-sync

Pau pau at teleopsia.com
Wed Feb 13 12:22:25 CET 2013


Hi, thanks

I do not want to look bungler, but wouldn't be a shortcut to implement 
asserts that clean the event before aborting?

El 13/02/2013 9:54, KIU Shueng Chuan escribió:
> Hi Pau,
>
> The system-wide critical section is currently implemented using a 
> win32 Event which, as you observed, has the possibility of resulting 
> in a deadlock in the following situation:
> 1) Process A takes the Event
> 2) Process B tries to take the Event and blocks
> 3) Process A aborts within the critical section (due to an assertion 
> being raised)
> 4) Since Process B has opened the Event, the OS will not clean up the 
> Event.
> 5) Process B and any subsequent process will now block forever for the 
> Event.
>
> As I mentioned in the previous mail, if the critical section were to 
> be implemented using a Mutex instead, then in step 5, Process B would 
> be able to enter the critical section with a return code of 
> WAIT_ABANDONED from WaitForSingleObject. (Or at least that's what I 
> read from MSDN)
>
> Note: If Process A aborted due to some exhaustion of resources, then 
> Process B would likely hit the same assertion too.
>
> The question is how to convert the Event to a Mutex and yet not break 
> compatibility with existing applications using older versions of the 
> library.
>
>
>
> On Wed, Feb 13, 2013 at 3:28 PM, Pau <pau at teleopsia.com 
> <mailto:pau at teleopsia.com>> wrote:
>
>     Hi,
>
>     I am back with the asserts happening inside a critical section in
>     signaler.cpp.
>     The problem still is that in signale.cpp make_fdpair(..) creates
>     system-wide critical section and does a number of things that can
>     generate a wsa_assert() or win_assert() before releasing the session.
>
>     I have seen that in the trunk someone has added a
>     CloseHandle(sync) at the end of the function, I do not know if it
>     had something related with this but I understand that the problem
>     is still there. wsa_assert() and wsa_windows() end up in 
>     RaiseException (0x40000015, EXCEPTION_NONCONTINUABLE, 1,
>     extra_info) which I understand is a cul de sac that has no way out
>     to clean up before leaving.
>
>     I guess we need a special assert function to use inside this
>     critical but I'd like a more documented opinion (Kiu?).
>
>     thanks,
>
>     Pau Ceano
>
>     El 21/01/2013 23:37, KIU Shueng Chuan escribió:
>>
>>     Hi Pau, a fix for the assertion on connection to port 5905 is in
>>     trunk branch.
>>
>>     I think the dangling critical section possibility could be fixed
>>     by changing the Event to a Mutex. When an assertion occurs the
>>     mutex would just be abandoned. However this change will cause
>>     backward compatibility issues with older versions.
>>
>>     On Jan 22, 2013 2:04 AM, "Pieter Hintjens" <ph at imatix.com
>>     <mailto:ph at imatix.com>> wrote:
>>
>>         Hi Pau,
>>
>>         So there are two different problems here, one is that we're
>>         hitting a
>>         socket limit on WXP, and the other is that the asserts are
>>         happening
>>         inside a critical section.
>>
>>         I don't think we can fix the first one easily but we can
>>         presumably
>>         assert in a smarter way. Do you want to try making a patch
>>         for this?
>>
>>         -Pieter
>>
>>         On Mon, Jan 21, 2013 at 6:23 PM, Pau <pau at teleopsia.com
>>         <mailto:pau at teleopsia.com>> wrote:
>>         >
>>         > Hi,
>>         >
>>         >
>>         > I am using (not yet in production) ZMQ on Windows and I
>>         have found what
>>         > I think is a big problem for Windows users.
>>         > We use WXP and W7 and Visual C++ different versions. ZMQ
>>         version 3.2.0
>>         > (as far as I see the same problem happens in 3.2.2)
>>         >
>>         > I do not fully understand ZMQ internals but I've seen that
>>         every time a
>>         > socket is created the function make_fdpair(..) is called and in
>>         > signaler.cpp(line244) a system event
>>         "zmq-signaler-port-sync" is created.
>>         > This event is used as a system-wide critical section and,
>>         so all
>>         > applications that try to create an event will
>>         WaitForSingleObject (sync,
>>         > INFINITE) until  SetEvent (...) is called.
>>         > The problem is that the code between:
>>         >   HANDLE sync = CreateEvent (NULL, FALSE, TRUE, TEXT
>>         > ("zmq-signaler-port-sync"));
>>         > and
>>         > SetEvent (sync);
>>         > is full of wsa_asserts(..) that will terminate the
>>         application if
>>         > something goes wrong.
>>         >
>>         > It is clear that terminating the application not leaving
>>         the system-wide
>>         > critical section is a bad idea because all applications in
>>         the system
>>         > will hang and you will have to stop all them to start again.
>>         > I understand that no errors should happen but anyway to
>>         escape from the
>>         > error is not a good idea in this case.
>>         >
>>         > I do not know all possible reasons to generate a fatal
>>         wsa_assert(..)
>>         > but there is at least one:
>>         >
>>         > I have seen that in XP it is possible that line 301  rc =
>>         connect (*w_,
>>         > (sockaddr *) &addr, sizeof (addr)); returns an error when a
>>         socket tries
>>         > to connect to 5905 and this has happened many times.
>>         > Windows uses port numbers starting near 1400 and XP has a
>>         limit at 5000.
>>         > In W7 this does not look as a problem because maximum is 65000
>>         > It sounds as if the number was big enough but apart from
>>         the fact that
>>         > ZMQ uses a big number of connections (at least in my tests)
>>         I have
>>         > experienced that Windows jumps port numbers by 7, so 5000
>>         happens
>>         > sometimes with catastrophic consequences.
>>         >
>>         > best,
>>         >
>>         > Pau Ceano
>>         > _______________________________________________
>>         > zeromq-dev mailing list
>>         > zeromq-dev at lists.zeromq.org
>>         <mailto:zeromq-dev at lists.zeromq.org>
>>         > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>         _______________________________________________
>>         zeromq-dev mailing list
>>         zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>>         http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>
>>     _______________________________________________
>>     zeromq-dev mailing list
>>     zeromq-dev at lists.zeromq.org  <mailto:zeromq-dev at lists.zeromq.org>
>>     http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>     _______________________________________________
>     zeromq-dev mailing list
>     zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>     http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130213/dfe61d85/attachment.html>


More information about the zeromq-dev mailing list