[zeromq-dev] zmq-signaler-port-sync
Pau
pau at teleopsia.com
Wed Feb 13 12:22:25 CET 2013
Hi, thanks
I do not want to look bungler, but wouldn't be a shortcut to implement
asserts that clean the event before aborting?
El 13/02/2013 9:54, KIU Shueng Chuan escribió:
> Hi Pau,
>
> The system-wide critical section is currently implemented using a
> win32 Event which, as you observed, has the possibility of resulting
> in a deadlock in the following situation:
> 1) Process A takes the Event
> 2) Process B tries to take the Event and blocks
> 3) Process A aborts within the critical section (due to an assertion
> being raised)
> 4) Since Process B has opened the Event, the OS will not clean up the
> Event.
> 5) Process B and any subsequent process will now block forever for the
> Event.
>
> As I mentioned in the previous mail, if the critical section were to
> be implemented using a Mutex instead, then in step 5, Process B would
> be able to enter the critical section with a return code of
> WAIT_ABANDONED from WaitForSingleObject. (Or at least that's what I
> read from MSDN)
>
> Note: If Process A aborted due to some exhaustion of resources, then
> Process B would likely hit the same assertion too.
>
> The question is how to convert the Event to a Mutex and yet not break
> compatibility with existing applications using older versions of the
> library.
>
>
>
> On Wed, Feb 13, 2013 at 3:28 PM, Pau <pau at teleopsia.com
> <mailto:pau at teleopsia.com>> wrote:
>
> Hi,
>
> I am back with the asserts happening inside a critical section in
> signaler.cpp.
> The problem still is that in signale.cpp make_fdpair(..) creates
> system-wide critical section and does a number of things that can
> generate a wsa_assert() or win_assert() before releasing the session.
>
> I have seen that in the trunk someone has added a
> CloseHandle(sync) at the end of the function, I do not know if it
> had something related with this but I understand that the problem
> is still there. wsa_assert() and wsa_windows() end up in
> RaiseException (0x40000015, EXCEPTION_NONCONTINUABLE, 1,
> extra_info) which I understand is a cul de sac that has no way out
> to clean up before leaving.
>
> I guess we need a special assert function to use inside this
> critical but I'd like a more documented opinion (Kiu?).
>
> thanks,
>
> Pau Ceano
>
> El 21/01/2013 23:37, KIU Shueng Chuan escribió:
>>
>> Hi Pau, a fix for the assertion on connection to port 5905 is in
>> trunk branch.
>>
>> I think the dangling critical section possibility could be fixed
>> by changing the Event to a Mutex. When an assertion occurs the
>> mutex would just be abandoned. However this change will cause
>> backward compatibility issues with older versions.
>>
>> On Jan 22, 2013 2:04 AM, "Pieter Hintjens" <ph at imatix.com
>> <mailto:ph at imatix.com>> wrote:
>>
>> Hi Pau,
>>
>> So there are two different problems here, one is that we're
>> hitting a
>> socket limit on WXP, and the other is that the asserts are
>> happening
>> inside a critical section.
>>
>> I don't think we can fix the first one easily but we can
>> presumably
>> assert in a smarter way. Do you want to try making a patch
>> for this?
>>
>> -Pieter
>>
>> On Mon, Jan 21, 2013 at 6:23 PM, Pau <pau at teleopsia.com
>> <mailto:pau at teleopsia.com>> wrote:
>> >
>> > Hi,
>> >
>> >
>> > I am using (not yet in production) ZMQ on Windows and I
>> have found what
>> > I think is a big problem for Windows users.
>> > We use WXP and W7 and Visual C++ different versions. ZMQ
>> version 3.2.0
>> > (as far as I see the same problem happens in 3.2.2)
>> >
>> > I do not fully understand ZMQ internals but I've seen that
>> every time a
>> > socket is created the function make_fdpair(..) is called and in
>> > signaler.cpp(line244) a system event
>> "zmq-signaler-port-sync" is created.
>> > This event is used as a system-wide critical section and,
>> so all
>> > applications that try to create an event will
>> WaitForSingleObject (sync,
>> > INFINITE) until SetEvent (...) is called.
>> > The problem is that the code between:
>> > HANDLE sync = CreateEvent (NULL, FALSE, TRUE, TEXT
>> > ("zmq-signaler-port-sync"));
>> > and
>> > SetEvent (sync);
>> > is full of wsa_asserts(..) that will terminate the
>> application if
>> > something goes wrong.
>> >
>> > It is clear that terminating the application not leaving
>> the system-wide
>> > critical section is a bad idea because all applications in
>> the system
>> > will hang and you will have to stop all them to start again.
>> > I understand that no errors should happen but anyway to
>> escape from the
>> > error is not a good idea in this case.
>> >
>> > I do not know all possible reasons to generate a fatal
>> wsa_assert(..)
>> > but there is at least one:
>> >
>> > I have seen that in XP it is possible that line 301 rc =
>> connect (*w_,
>> > (sockaddr *) &addr, sizeof (addr)); returns an error when a
>> socket tries
>> > to connect to 5905 and this has happened many times.
>> > Windows uses port numbers starting near 1400 and XP has a
>> limit at 5000.
>> > In W7 this does not look as a problem because maximum is 65000
>> > It sounds as if the number was big enough but apart from
>> the fact that
>> > ZMQ uses a big number of connections (at least in my tests)
>> I have
>> > experienced that Windows jumps port numbers by 7, so 5000
>> happens
>> > sometimes with catastrophic consequences.
>> >
>> > best,
>> >
>> > Pau Ceano
>> > _______________________________________________
>> > zeromq-dev mailing list
>> > zeromq-dev at lists.zeromq.org
>> <mailto:zeromq-dev at lists.zeromq.org>
>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130213/dfe61d85/attachment.htm>
More information about the zeromq-dev
mailing list