[zeromq-dev] zmq-signaler-port-sync
KIU Shueng Chuan
nixchuan at gmail.com
Wed Feb 13 09:54:03 CET 2013
Hi Pau,
The system-wide critical section is currently implemented using a win32
Event which, as you observed, has the possibility of resulting in a
deadlock in the following situation:
1) Process A takes the Event
2) Process B tries to take the Event and blocks
3) Process A aborts within the critical section (due to an assertion being
raised)
4) Since Process B has opened the Event, the OS will not clean up the Event.
5) Process B and any subsequent process will now block forever for the
Event.
As I mentioned in the previous mail, if the critical section were to be
implemented using a Mutex instead, then in step 5, Process B would be able
to enter the critical section with a return code of WAIT_ABANDONED from
WaitForSingleObject. (Or at least that's what I read from MSDN)
Note: If Process A aborted due to some exhaustion of resources, then
Process B would likely hit the same assertion too.
The question is how to convert the Event to a Mutex and yet not break
compatibility with existing applications using older versions of the
library.
On Wed, Feb 13, 2013 at 3:28 PM, Pau <pau at teleopsia.com> wrote:
> Hi,
>
> I am back with the asserts happening inside a critical section in
> signaler.cpp.
> The problem still is that in signale.cpp make_fdpair(..) creates
> system-wide critical section and does a number of things that can generate
> a wsa_assert() or win_assert() before releasing the session.
>
> I have seen that in the trunk someone has added a CloseHandle(sync) at the
> end of the function, I do not know if it had something related with this
> but I understand that the problem is still there. wsa_assert() and
> wsa_windows() end up in RaiseException (0x40000015,
> EXCEPTION_NONCONTINUABLE, 1, extra_info) which I understand is a cul de sac
> that has no way out to clean up before leaving.
>
> I guess we need a special assert function to use inside this critical but
> I'd like a more documented opinion (Kiu?).
>
> thanks,
>
> Pau Ceano
>
> El 21/01/2013 23:37, KIU Shueng Chuan escribió:
>
> Hi Pau, a fix for the assertion on connection to port 5905 is in trunk
> branch.
>
> I think the dangling critical section possibility could be fixed by
> changing the Event to a Mutex. When an assertion occurs the mutex would
> just be abandoned. However this change will cause backward compatibility
> issues with older versions.
> On Jan 22, 2013 2:04 AM, "Pieter Hintjens" <ph at imatix.com> wrote:
>
>> Hi Pau,
>>
>> So there are two different problems here, one is that we're hitting a
>> socket limit on WXP, and the other is that the asserts are happening
>> inside a critical section.
>>
>> I don't think we can fix the first one easily but we can presumably
>> assert in a smarter way. Do you want to try making a patch for this?
>>
>> -Pieter
>>
>> On Mon, Jan 21, 2013 at 6:23 PM, Pau <pau at teleopsia.com> wrote:
>> >
>> > Hi,
>> >
>> >
>> > I am using (not yet in production) ZMQ on Windows and I have found what
>> > I think is a big problem for Windows users.
>> > We use WXP and W7 and Visual C++ different versions. ZMQ version 3.2.0
>> > (as far as I see the same problem happens in 3.2.2)
>> >
>> > I do not fully understand ZMQ internals but I've seen that every time a
>> > socket is created the function make_fdpair(..) is called and in
>> > signaler.cpp(line244) a system event "zmq-signaler-port-sync" is
>> created.
>> > This event is used as a system-wide critical section and, so all
>> > applications that try to create an event will WaitForSingleObject (sync,
>> > INFINITE) until SetEvent (...) is called.
>> > The problem is that the code between:
>> > HANDLE sync = CreateEvent (NULL, FALSE, TRUE, TEXT
>> > ("zmq-signaler-port-sync"));
>> > and
>> > SetEvent (sync);
>> > is full of wsa_asserts(..) that will terminate the application if
>> > something goes wrong.
>> >
>> > It is clear that terminating the application not leaving the system-wide
>> > critical section is a bad idea because all applications in the system
>> > will hang and you will have to stop all them to start again.
>> > I understand that no errors should happen but anyway to escape from the
>> > error is not a good idea in this case.
>> >
>> > I do not know all possible reasons to generate a fatal wsa_assert(..)
>> > but there is at least one:
>> >
>> > I have seen that in XP it is possible that line 301 rc = connect (*w_,
>> > (sockaddr *) &addr, sizeof (addr)); returns an error when a socket tries
>> > to connect to 5905 and this has happened many times.
>> > Windows uses port numbers starting near 1400 and XP has a limit at 5000.
>> > In W7 this does not look as a problem because maximum is 65000
>> > It sounds as if the number was big enough but apart from the fact that
>> > ZMQ uses a big number of connections (at least in my tests) I have
>> > experienced that Windows jumps port numbers by 7, so 5000 happens
>> > sometimes with catastrophic consequences.
>> >
>> > best,
>> >
>> > Pau Ceano
>> > _______________________________________________
>> > zeromq-dev mailing list
>> > zeromq-dev at lists.zeromq.org
>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>
>
> _______________________________________________
> zeromq-dev mailing listzeromq-dev at lists.zeromq.orghttp://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130213/31310d92/attachment.htm>
More information about the zeromq-dev
mailing list